What's in the Box?: A descriptive analysis of the Philippine importation

Title.png

Executive Summary

The Bureau of Customs (BOC), an attached agency of the Department of Finance, was established in 1902. It is the second largest revenue-collecting agency of the government after the Bureau of Internal Revenue. Republic Act No. 10863, also known as the Customs Modernization and Tariff Act, approved by former President Benigno S. Aquino III on May 30, 2016, is the latest law guiding the mandates and functions of the Bureau. Under this law, the Bureau is responsible for assessing and collecting customs revenues, controlling illicit trade and customs fraud, and facilitating trade. This law also explicitly affirms, in black and white, that the Bureau shall utilize information and communications technology to perform its functions. The CMTA implementation occurred during the time of former President Rodrigo Roa Duterte (PRRD) and several ICT online systems were implemented by BOC to curb graft and corruption and increase the collection of lawful revenues through duties and taxes.

Considering the changes in diplomatic policies, tax laws, and modernization initiatives in PRRD's administration, the team highlighted the following question as the center of this study:

"How did the Philippine importation perform during the Duterte administration?"

This study aims to identify the trends, patterns, and insights regarding the Philippine Importations from 2015 to 2022, especially those years during PRRD's Term, and to determine imported rice consumption patterns of Filipinos between 2015 (before RTL) and 2020 (a year after RTL implementation).

To aid our analysis, the team applied a methodology involving data extraction, data cleaning and processing, and the use of Exploratory Data Analysis (EDA) to guide the team for the questions that would be explored, analyzed, and answered at the end of this study.

The Duterte administration was known for its fight against illegal drugs, criminality, and corruption. The Philippine importations during the first and second half of PRRD's term showed an opposite trend between dutiable value and duties and taxes. From 2016 to 2018, the total dutiable value had an average growth rate of 25% but the total duties and taxes remained flat at 14%. However, from 2019 to 2022, the total dutiable value showed an average decrease of 1%, while the total duties and taxes posted a 19% increase. The increasing and all-time-high breaches in total duties and taxes assessed were mainly due to the appointment of the new BOC Commissioner Guerrero in 2019 amidst the COVID-19 pandemic. Guerrero's implementation of his 10-point priority program that included fully automating the transactions in the frontline and the pandemic made it urgent for everyone to immediately embrace the online filing and processing initiatives of the BOC. This resulted in a notable and remarkable increase in the total duties and tax assessments and a reduction in graft and corruption. Such a turn of events also resulted in another milestone where the total duties and taxes assessed after June 2020 breached 68 billion and never went below it again. This is truly a good example of digitalization initiatives that should be prioritized and capitalized on by the current administration to continue the positive and increasing trend in duties and taxes that the government collects.

When PRRD took office in 2016, he pledged to shift the Philippines away from the United States in favor of China and Russia. However, towards his last two years in office, he had a change of heart that can probably be dated to as early as June 2020 because of the amped-up assertiveness of Beijing in the South China Sea like the incidents involving hundreds of Chinese militia boats.

From 2015 to 2022, importations from China grew massively exceeding the 1 trillion-peso total dutiable value in 2018. It peaked in 2020 but started decreasing since then. On the other hand, the importations from the United States remained stable between the 300 to 500 billion annual total dutiable value. The total dutiable value of importations from Japan was a far second from China reaching only 800 billion pesos in 2020. Importations from Korea is ranked third in total dutiable value since 2015.

The electrical machinery and mineral fuels chapters have consistently bagged the top two spots since 2015. Electrical machinery chapter was largely imported from China so the effective duties and taxes were lower than that of the mineral fuels chapter since the latter had higher duties and excise tax rates, in addition to VAT.

The Rice Tariffication Law (RTL) took effect during PRRD’s administration in March 2019. Analysis of the data showed that rice importations from Viet Nam increased since the implementation of RTL. In 2015, there were 16.85 kilograms of imported rice for every person in the country. This proportion increased to 20.55 kilograms in 2020.

Imported rice from Pakistan and Myanmar has lower importation costs than Viet Nam and Thailand rice. Their average cost is around 20 pesos per kilogram while Viet Nam and Thailand rice are in the 23-25 peso range.

After summarizing the results of the team's thorough analysis of the available importation and census data, the team has arrived at the following recommendations to the BOC: improve data quality by adding stricter and more robust validation checks, evaluate customs memorandum orders carefully, continue implementing and improving existing BOC's modernization projects, and use Application Programming Interface to enable real-time public sharing of accurate BOC reports. The team also highlighted the following items for further studies/analysis: encourage importers to consider importing rice from countries offering lower effective rice costs such as Pakistan and Myanmar, boost trade with countries other than China, focus on the top 10 chapters when considering changes in trade-related government operations and tax law creation/modification since such chapters take at least 70% of the total imports, and finally redirect importations when there are cheaper but same quality alternatives to help drive down the cost of products in the Philippines.

Problem Statement

How did the Philippine importation perform during the Duterte administration?

Motivation

It’s public knowledge that Duterte made a considerable change to foreign policy by re-positioning the Philippines' diplomatic relations to more friendly and positive relations and engagement with China. We would like to know how this affected the importation of goods in the Philippines during the time of former President Rodrigo Roa Duterte (PRRD). Also, certain laws related to importation were implemented during his administration such as the Rice Tariffication Law (RTL). The RTL replaced the quantitative restrictions on imported rice with tariffs of 35 to 40 percent and established the Rice Competitiveness Enhancement Fund (RCEF) funded by the tariff revenues. We would like to know how this has changed rice importation and its relation to the population growth of the Philippines from 2015 to 2020.

Methodology

The high-level methodology used in this study is as follows:

No. Step Description
1. Data extraction Obtain Philippine Customs importation data from 2015 to 2022 and Philippine Census data for years 2015 and 2020 from jojie-collected public datasets (directory: /mnt/data/public) and store those in sqlite3 databases.
2. Data cleaning Prepare, clean, and process the collected data accordingly to get the relevant data subsets and columns.
3. Data processing Remove or add necessary columns for further analysis and create functions in preparation for EDA.
4. Exploratory Data Analysis (EDA) Provide a list of objectives and questions that would be explored, analyzed, and answered in the subsequent sections pertaining to trends, patterns, and insights regarding the Philippine Importation.

The detailed steps performed related to the above methodology is presented in the Data Exploration and Results and Discussion sections of this document.

Data Source and Description

IMPORTATION DATA:

The source of the importation data is the Bureau of Customs' website's customs processing system called the E2M (Electronic-to-Mobile). It contains downloadable excel files that were scraped and made available to the team via the jojie-collected public datasets (directory: /mnt/data/public/customs) of the Asian Institute of Management (AIM).

The specific data used for this study includes all the monthly excel files from January 2015 to September 2022 which contain 33,293,684 rows and varying numbers of columns depending on the report available. Though there were more columns present in such files, only the following columns were used and considered relevant for this study:

Column Name Data Type Short description
MONTH_YEAR TEXT Assessment month and year based on the filename
HS_CODE TEXT 11-digit Harmonized System (HS) code. This contains identification codes given to goods for use in international trade
COUNTRY_EXPORT TEXT Name of the Exporting Country
PREFERENTIAL_CODE TEXT Code used for preferential treatment of importation based on Free Trade Agreement
DUTIABLE_VALUE_FOREIGN REAL Financial value of the shipment based on invoice in foreign currency
CURRENCY TEXT Currency of the dutiable_value_foreign used by the importer
EXCHANGE_RATE REAL Exchange rate used to convert the foreign value to Philippine Peso
DUTIABLE_VALUE_PHP REAL Value of the shipment in Philippine Peso
DUTIES_AND_TAXES REAL System-calculated duties and taxes
EXCISE_ADVALOREM_PAID REAL System-calculated excise tax
VAT_BASE REAL The landed cost plus excise taxes, if any
VAT_PAID REAL System-calculated Value-Added Tax
NET_MASS_KGS REAL Net weight of the shipment in kilograms
GOODS_DESCRIPTION TEXT Description of the goods
CHAPTER TEXT Description of the HS Code

PHILIPPINE CENSUS:

The source of the census data is the Philippine Statistics Authority. It also contains downloadable excel files that were scraped and made available to the team via the jojie-collected public datasets (directory: /mnt/data/public/customs) of AIM.

For this study, the team selected only the Table A - Population and Annual Growth Rate for the Philippines and its Regions, Provinces, and Highly Urbanized Cities file and used the Philippines' total population data for the years 2015 and 2020.

Data Exploration

Import Libraries¶

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from numpy import arange
import sqlite3
import pickle
import re
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import warnings
import plotly.express as px

c_blue = '#0038a8'
c_yellow = '#fcd116'
c_red = '#ce1126'
c_gray = '#9c9a9d'
c_black = '#000000'
c_green = '#3cb043'

pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_rows', None)
pd.set_option('display.max_colwidth', None)
pd.set_option('mode.chained_assignment', None)

# plt.rcParams['font.family'] = 'sans-serif'
# plt.rcParams['font.sans-serif'] = 'DejaVu Sans'
plt.rcParams['axes.edgecolor'] = '#9c9a9d'
plt.rcParams['axes.linewidth'] = 0.8
plt.rcParams['xtick.color'] = c_black
plt.rcParams['ytick.color'] = c_black
plt.rcParams['grid.linewidth'] = 1
plt.rcParams["figure.figsize"] = (15, 10)
plt.rcParams['figure.dpi'] = 300

warnings.filterwarnings("ignore")
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-a3nueeu4 because the default path (/home/elacson/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.

Utility Functions¶

In [2]:
@ticker.FuncFormatter
def billion_formatter(x, pos):
    """Returns formatted values in billions."""
    return '{:,.1f} B'.format(x/1E9)


@ticker.FuncFormatter
def million_formatter(x, pos):
    """Returns formatted values in millions."""
    return '{:,.1f} M'.format(x/1E6)


@ticker.FuncFormatter
def thousands_formatter(x, pos):
    """Returns formatted values in thousands."""
    return '{:,.1f} K'.format(x/1E3)


def display_full(x):
    """Returns a dataframe with customized display settings."""
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 2000)
    pd.set_option('display.float_format', '{:20,.2f}'.format)
    pd.set_option('display.max_colwidth', None)
    display(x)
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')
    pd.reset_option('display.float_format')
    pd.reset_option('display.max_colwidth')


def highlight_peaks(ax):
    """Returns customized highlights for plots in the specified dates."""
    ax.axvspan("2020-06-01", "2020-08-01", color="#fcd116", alpha=0.3)
    ax.axvspan("2019-02-01", "2019-04-01", color="#fcd116", alpha=0.3)
    ax.axvspan("2016-10-01", "2016-12-01", color="#fcd116", alpha=0.3)
    ax.axvspan("2017-06-01", "2017-08-01", color="#fcd116", alpha=0.3)


def outliers():
    """Return a pandas data frame containing outliers removed from the
    dataset.
    """
    return pd.read_sql("""
    SELECT *
    FROM outliers
    """, conn)


def trends_yearly():
    """Returns yearly importation trend based on dutiable value and duties 
    and taxes."""

    # Create df
    df_sum_years = (df_imports_agg.groupby([pd.Grouper(key='MONTH_YEAR',
                                                       freq='Y'), 'HSCODE_2'])
                    .sum().sort_values(by=['MONTH_YEAR',
                                           'SUM_DUTIABLE_VALUE_PHP'],
                                       ascending=[True, False]).reset_index())
    df_sum_years = df_sum_years.merge(df_chapters, on='HSCODE_2')
    df_sum_years

    # Plot yearly importation based on dutiable value in PHP
    fig, ax = plt.subplots(figsize=(15, 20), dpi=200)

    df_year = df_sum_years.groupby('MONTH_YEAR').sum().reset_index()
    df_year.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
                 color=c_blue, legend=False, figsize=(15, 10), ax=ax)

    marks = df_year[df_year['SUM_DUTIABLE_VALUE_PHP'] > 6e12].iloc[:, :2]
    marks.columns = ['MONTH_YEAR', 'EXCEEDS_6_TRILLION']
    marks.plot(x='MONTH_YEAR', y='EXCEEDS_6_TRILLION',
               legend=False, style='r*', ms=15, ax=ax)

    title = """Philippine imports' dutiable value peaked by 2018"""
    ax.set_title(title, fontsize=20, color=c_blue)
    ax.set_xlabel('Year', fontsize=16)
    plt.xticks(fontsize=12)
    ax.set_ylabel('Trillion PHP', fontsize=16)
    ax.grid(axis='x')
    ax.axvspan("2020-04-01", "2023", color=c_yellow, alpha=0.3)
    style = dict(size=20, color=c_black)
    ax.text('2020-05', 3.5e12, 'Covid-19 Pandemic',
            ha='left', zorder=5, **style)

    plt.show()

    def highlight_peaks(ax):
        """Returns customized highlights for plots in the specified dates."""
        ax.axvspan("2020-06-01", "2020-08-01", color="#fcd116", alpha=0.3)
        ax.axvspan("2019-02-01", "2019-04-01", color="#fcd116", alpha=0.3)
        ax.axvspan("2016-10-01", "2016-12-01", color="#fcd116", alpha=0.3)
        ax.axvspan("2017-06-01", "2017-08-01", color="#fcd116", alpha=0.3)

    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 20), dpi=200)
    fig.suptitle("Yearly trend of customs imports", fontsize=24)

    #df_year = df_sum_years.groupby('MONTH_YEAR').sum().reset_index()
    df_year['DUTES_AND_TAXES_PER_DUTIABLE_VALUE'] = \
        df_year['SUM_DUTIES_AND_TAXES']*100/df_year['SUM_DUTIABLE_VALUE_PHP']

    df_year.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
                 color=c_blue, ax=ax1)
    df_year.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
                 color=c_red, ax=ax1)
    df_year.plot(x='MONTH_YEAR', y='DUTES_AND_TAXES_PER_DUTIABLE_VALUE',
                 color=c_blue, ax=ax2)

    title = """Massive increase in importation in 2018 did not 
    translate to duties and taxes.
    New tax law rate implementation seen starting 2020"""
    ax1.set_title(title, fontsize=18, color=c_blue)
    ax1.set_xlabel(None)
    ax1.set_ylabel('Trillion PHP', fontsize=16)
    ax2.set_xlabel('Year', fontsize=16)
    ax2.set_ylabel('% of Duties & Taxes vs Dutiable Value', fontsize=16)
    ax1.tick_params(axis="x", labelsize=0)
    ax2.tick_params(axis="x", labelsize=12)
    ax1.legend([])
    ax2.legend([])
    ax1.grid(axis='x')
    ax2.grid(axis='x')
    highlight_peaks(ax1)
    highlight_peaks(ax2)
    plt.subplots_adjust(wspace=0, hspace=0)
    fig.subplots_adjust(top=0.92)

    style = dict(size=10, color=c_red)
    ax1.text('2015-02', 6e11, 'DUTIES AND TAXES', ha='left', **style)
    style = dict(size=10, color=c_blue)
    ax1.text('2015-02', 3.3e12, 'DUTIABLE VALUE', ha='left', **style)
    text = 'DUTIES AND TAXES PER \nDUTIABLE VALUE'
    ax2.text('2015-02', 11.5, text, ha='left', **style)
    ax2.set(ylim=(0, 20))

    plt.show()


def trends_monthly():
    """Plot monthly trend based on dutiable value, duties & taxes, and 
    overall monthly trend of duties and taxes over dutiable value.
    """
    fig, ax = plt.subplots(figsize=(15, 20), dpi=200)
    fig.suptitle("Monthly sum of dutiable value", fontsize=24)

    df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
    df_month.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
                  color=c_blue, legend=False, figsize=(15, 10), ax=ax)

    marks = df_month[df_month['SUM_DUTIABLE_VALUE_PHP'] > 7e11].iloc[:, :2]
    marks.columns = ['MONTH_YEAR', 'EXCEEDS_700_BILLION']
    marks.plot(x='MONTH_YEAR', y='EXCEEDS_700_BILLION',
               legend=False, style='r*', ms=15, ax=ax)

    title = """Total dutiable value exceeds 700B in the last four months,
    that threshold was surpassed only four other times before."""
    ax.set_title(title, fontsize=20, color=c_blue)
    ax.set_xlabel('Year', fontsize=16)
    plt.xticks(fontsize=12)
    ax.set_ylabel('Trillion PHP', fontsize=16)
    ax.grid(axis='x')
    ax.axhline(y=7e11, color=c_red, linestyle='--')
    ax.axvspan("2020-04-01", "2023", color=c_yellow, alpha=0.3)

    style = dict(size=20, color=c_red)
    ax.text('2015-02', 7.1e11, '700 Billion PHP', ha='left', zorder=5, **style)
    style = dict(size=20, color=c_black)
    ax.text('2020-05', 18e10, 'Covid-19 Pandemic',
            ha='left', zorder=5, **style)

    plt.show()

    # Plot by duties and taxes
    fig, ax = plt.subplots(figsize=(15, 20), dpi=200)
    fig.suptitle("Monthly sum of duties and taxes", fontsize=24)

    df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
    df_month['SUM_DUTIES_AND_TAXES'] = df_month['SUM_DUTIES_AND_TAXES'] / 1e9
    df_month.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
                  color=c_blue, legend=False, figsize=(15, 10), ax=ax)

    title = """
    The significant jump in duties and taxes started during the pandemic:
    All months after June 2020 are greater than all months before"""
    plt.title(title, fontsize=20, color=c_blue)
    plt.xlabel('Year', fontsize=16)
    plt.xticks(fontsize=12)
    plt.ylabel('Billion PHP', fontsize=16)
    plt.grid(axis='x')
    plt.axvspan("2020-04-01", "2023", color="#fcd116", alpha=0.3)
    ax.axhline(y=68, color=c_red, linestyle='--')

    style = dict(size=15, color=c_blue)
    ax.text('2017-01', 50, 'SUM OF DUTIES AND TAXES', ha='center', **style)
    style = dict(size=20, color=c_red)
    ax.text('2015-02', 68.5, 'June 2020 (68B PHP)', ha='left', **style)
    style = dict(size=20, color=c_black)
    ax.text('2020-05', 12, 'Covid-19 Pandemic', ha='left', **style)

    plt.show()

    # Plot overall monthly trend of duties and taxes over dutiable value
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 20), dpi=200)
    fig.suptitle("Monthly trend of customs imports", fontsize=24)

    df_month = df_imports_agg.groupby('MONTH_YEAR').sum().reset_index()
    df_month['DUTES_AND_TAXES_PER_DUTIABLE_VALUE'] = \
        df_month['SUM_DUTIES_AND_TAXES']*100/df_month['SUM_DUTIABLE_VALUE_PHP']

    df_month.plot(x='MONTH_YEAR', y='SUM_DUTIABLE_VALUE_PHP',
                  color=c_blue, ax=ax1)
    df_month.plot(x='MONTH_YEAR', y='SUM_DUTIES_AND_TAXES',
                  color=c_red, ax=ax1)
    df_month.plot(x='MONTH_YEAR', y='DUTES_AND_TAXES_PER_DUTIABLE_VALUE',
                  color=c_blue, ax=ax2)

    title = """Huge spikes in importation did not translate to duties and taxes,
    Items that caused the spikes had tax exemptions"""
    ax1.set_title(title, fontsize=24, color=c_blue)
    ax1.set_xlabel(None)
    ax1.set_ylabel('Trillion PHP', fontsize=16)
    ax2.set_xlabel('Year', fontsize=16)
    ax2.set_ylabel('Percentage', fontsize=16)
    ax1.tick_params(axis="x", labelsize=0)
    ax2.tick_params(axis="x", labelsize=12)
    ax1.legend([])
    ax2.legend([])
    ax1.grid(axis='x')
    ax2.grid(axis='x')
    highlight_peaks(ax1)
    highlight_peaks(ax2)
    plt.subplots_adjust(wspace=0, hspace=0)
    fig.subplots_adjust(top=0.92)

    style = dict(size=10, color=c_red)
    ax1.text('2015-02', 5e10, 'SUM DUTIES AND TAXES', ha='left', **style)
    style = dict(size=10, color=c_blue)
    ax1.text('2015-02', 4.5e11, 'SUM DUTIABLE VALUE', ha='left', **style)
    text = 'DUTIES AND TAXES PER \nDUTIABLE VALUE'
    ax2.text('2015-02', 14, text, ha='left', **style)
    ax2.set(ylim=(0, 25))

    plt.show()


def prrd_top_bot_chapters():
    """Returns plots of the top 10 and bottom 5 chapters during PRRD's Term.
    """
    # Plot Top 10 Chapters using a bar graph based on dutiable value
    # Create a df containing importation data during PRRD's term
    mask = ((df_imports_agg['MONTH_YEAR'] >= "2016-06-01") &
            (df_imports_agg['MONTH_YEAR'] <= "2022-06-01"))
    df_prrd = df_imports_agg[mask]

    # Save the top & bottom chapters by dutiable value & duties & taxes in PHP
    n = 10
    df_topn_dv = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
                  ['SUM_DUTIABLE_VALUE_PHP']
                  .sum()
                  .sort_values(ascending=False)[:n]
                  .reset_index())
    df_topn_dt = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
                  ['SUM_DUTIES_AND_TAXES']
                  .sum().sort_values(ascending=False)[:n].reset_index())
    n = 5
    df_botn_dv = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
                  ['SUM_DUTIABLE_VALUE_PHP']
                  .sum().sort_values(ascending=True)[:n].reset_index())
    df_botn_dt = (df_prrd.groupby(['HSCODE_2', 'CHAPTER'])
                  ['SUM_DUTIES_AND_TAXES']
                  .sum().sort_values(ascending=True)[:n].reset_index())
    # Set labels
    label_top_dv = ('Electrical machinery \nand equipment',
                    'Mineral fuels, mineral oils',
                    'Nuclear reactors, boilers',
                    'Vehicles other than railway \nor tramway rolling-stock',
                    'Plastics and articles thereof',
                    'Iron and steel',
                    'Ores, slag and ash',
                    'Cereals',
                    'Optical, photographic, \ncinematographic',
                    'Pharmaceutical products')

    label_top_dt = ('Mineral fuels, mineral oils',
                    'Electrical machinery \nand equipment',
                    'Vehicles other than railway \nor tramway rolling-stock',
                    'Nuclear reactors, boilers',
                    'Plastics and articles thereof',
                    'Iron and steel', 'Miscellaneous chemical\n products',
                    'Cereals', 'Miscellaneous edible \npreparations',
                    'Ores, slag and ash')

    # Based on dutiable value
    x = df_topn_dv['CHAPTER']
    y = df_topn_dv['SUM_DUTIABLE_VALUE_PHP']

    font1 = {'family': 'serif', 'color': c_blue}
    fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
    fig.suptitle("Top 10 chapters during PRRD's term",
                 fontsize=24, fontdict=font1)

    # Top 10 chapters based on dutiable value
    ax.barh(x, y, color=[c_blue, c_yellow, 'lightgray', 'lightgray',
                         'lightgray', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray'], tick_label=label_top_dv)
    ax.invert_yaxis()  # labels read top-to-bottom

    ax.set_xlabel('Dutiable Value in Trillion PHP', fontsize=16)
    ax.set_title("PH imports Chapter 85: Electrical machinery the most, "
                 "but ...",
                 fontsize=20, color=c_blue)
    ax.yaxis.set_label_position("right")
    ax.tick_params(axis='x', labelsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.yaxis.tick_right()
    plt.show()

    # Top 10 chapters based on duties and taxes
    fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
    x = df_topn_dt['CHAPTER']
    y = df_topn_dt['SUM_DUTIES_AND_TAXES']

    ax.barh(x, y, color=[c_blue, c_yellow, 'lightgray', 'lightgray',
                         'lightgray', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray'], tick_label=label_top_dt)
    ax.invert_yaxis()  # labels read top-to-bottom
    ax.set_xlabel('Total duties and taxes in trillion PHP', fontsize=16)

    ax.set_title("PH collects more duties and taxes from chapter 27: "
                 "Mineral fuels", fontsize=20, color=c_blue)
    ax.yaxis.set_label_position("right")
    ax.tick_params(axis='x', labelsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.yaxis.tick_right()
    plt.show()

    # Plot treemap of chapters based on Dutiable value
    # Set labels
    label_top_dv = df_topn_dv.HSCODE_2.tolist()
    label_top_dt = df_topn_dt.HSCODE_2.tolist()
    df_tree_dv = (df_prrd.groupby(['HSCODE_2'])
                  ['SUM_DUTIABLE_VALUE_PHP'].sum().reset_index())
    df_tree_dv['TOP'] = np.where(df_tree_dv['HSCODE_2'].isin(label_top_dv),
                                 'Part of Top 10', 'Others')

    fig = px.treemap(df_tree_dv, path=['HSCODE_2'],
                     values='SUM_DUTIABLE_VALUE_PHP', width=1000, height=700,
                     color='TOP',
                     color_discrete_map={'Part of Top 10': c_blue,
                                         'Others': c_gray})
    fig.update_layout(title=dict(
                      text='70% of importations based on dutiable value '
                           'came from the top 10 chapters...',
                      font=dict(size=18, color=c_blue)),
                      margin=dict(t=50, l=25, r=25, b=25),
                      uniformtext_minsize=16, uniformtext_mode='hide')
    fig.show()

    # Plot treemap of chapters based on Duties & Taxes
    df_tree_dt = (df_prrd.groupby(['HSCODE_2'])
                  ['SUM_DUTIES_AND_TAXES'].sum().reset_index())
    df_tree_dt['TOP'] = np.where(df_tree_dt['HSCODE_2'].isin(label_top_dt),
                                 'Part of Top 10', 'Others')

    fig = px.treemap(df_tree_dt, path=['HSCODE_2'],
                     values='SUM_DUTIES_AND_TAXES', width=1000, height=700,
                     color='TOP',
                     color_discrete_map={'Part of Top 10': c_blue,
                                         'Others': c_gray})
    fig.update_layout(title=dict(
                      text='While 74% based on duties & taxes '
                           'came from the top 10 chapters',
                      font=dict(size=18, color=c_blue)),
                      margin=dict(t=50, l=25, r=25, b=25),
                      uniformtext_minsize=16, uniformtext_mode='hide')
    fig.show()

    # Plot bottom chapters
    # Based on dutiable value
    x = df_botn_dv['CHAPTER']
    y = df_botn_dv['SUM_DUTIABLE_VALUE_PHP']
    label_bot_dv = ['Vegetable plaiting materials',
                    'Live trees and other plants',
                    'Manufactures of straw & \nplaiting materials',
                    'Cork and articles of cork',
                    "Works of art, collectors' \npieces and antiques"]
    label_bot_dt = ['Vegetable plaiting materials', 'Live animals', 'Silk',
                    'Live trees and other plants',
                    'Cork and articles of cork']
    font1 = {'family': 'serif', 'color': c_red}
    fig, ax = plt.subplots(figsize=(15, 10), dpi=200)
    fig.suptitle("Bottom 5 chapters during PRRD's term makes sense...",
                 fontsize=24, fontdict=font1)

    # Bottom 5 chapters based on dutiable value
    ax.barh(x, y, color=[c_yellow, c_yellow, 'lightgray', 'lightgray',
                         'lightgray'], tick_label=label_bot_dv)
    ax.invert_yaxis()  # labels read top-to-bottom

    ax.set_xlabel('Dutiable value in billion PHP', fontsize=16, color=c_red)
    ax.set_title("Vegetable plaiting is the least imported product "
                 "and live trees are highly regulated",
                 fontsize=20, color=c_red)
    ax.yaxis.set_label_position("right")
    ax.tick_params(axis='x', labelsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.yaxis.tick_right()
    plt.show()

    # Bottom 5 chapters based on duties and taxes
    fig, ax = plt.subplots(figsize=(15, 8), dpi=200)
    x = df_botn_dt['CHAPTER']
    y = df_botn_dt['SUM_DUTIES_AND_TAXES']

    ax.barh(x, y, color=['lightgray', c_yellow, 'lightgray', c_yellow,
                         'lightgray'], tick_label=label_bot_dt)
    ax.invert_yaxis()  # labels read top-to-bottom
    ax.set_xlabel('Total duties and taxes in 10 Million PHP', fontsize=20,
                  color=c_red)

    ax.set_title("Live animals and trees are in the bottom due to import"
                 " restrictions", fontsize=20, color=c_red)
    ax.yaxis.set_label_position("right")
    ax.tick_params(axis='x', labelsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.yaxis.tick_right()
    plt.show()


def top5_compare_chapters():
    """Returns plots of comparison of top 5 chapters based on dutiable
    value against duties & taxes.
    """
    # Create DF for top 200 (DUTIABLE VALUE)
    df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')

    topn = 200
    df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))

    # Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
    df_top_yr_hs_dv = (df_top_200.reset_index())
    df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
        'SUM_DUTIABLE_VALUE_PHP']
        .rank('dense', ascending=False))
    df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
    df_rank_15_dv

    # Rank by [Month_Year and HSCODE] (DUTY TAXES)
    df_top_yr_hs_dt = (df_top_200.reset_index())
    df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
        'SUM_DUTIES_AND_TAXES']
        .rank('dense', ascending=False))
    df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
    df_rank_15_dt

    # Create a list of Top Items (DV & DT)
    list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
    list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()

    # Create a merged DF for identified top15 HSCODES (DV & DT)
    df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dv)]
    df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dt)]

    # Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_yr_hs_dv = (df_imports_merged_dv
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIABLE_VALUE_PHP'],
                                      ascending=[True, False]))
    df_top15_yr_hs_dt = (df_imports_merged_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))

    # Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))

    df_top5_sdt = (df_imports_merged
                   .groupby('HSCODE_2')
                   .sum()
                   .nlargest(5, 'SUM_DUTIES_AND_TAXES')
                   .sort_values(by=['SUM_DUTIES_AND_TAXES'],
                                ascending=[False])
                   .reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
                                    'SUM_DUTIES_AND_TAXES',
                                    'SUM_NET_MASS_KGS']]
    df_top5_sdvp = (df_imports_merged
                    .groupby('HSCODE_2')
                    .sum()
                    .nlargest(5, 'SUM_DUTIABLE_VALUE_PHP')
                    .sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
                                 ascending=[False])
                    .reset_index()
                    .set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
                                            'SUM_DUTIES_AND_TAXES',
                                            'SUM_NET_MASS_KGS']])
    df_color = df_top5_sdvp.reset_index().drop(
        ['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
         'SUM_NET_MASS_KGS'], axis=1)
    colors = {'85': '#0038a8',
              '27': '#fcd116',
              '84': '#a9a9a9',
              '87': '#989898',
              '39': '#878787'}

    # FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
    df_top5_sdvp_plt = df_top5_sdvp.reset_index().drop('HSCODE_2', axis=1)
    label1 = ['Top 1: CH85', 'Top 2: CH27',
              'Top 3: CH84', 'Top 4: CH87', 'Top 5: CH39']

    # FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
    df_top5_sdt_plt = df_top5_sdt.reset_index().drop('HSCODE_2', axis=1)
    label2 = ['Top 2.5: CH85', 'Top 1: CH27',
              'Top 2.5: CH84', 'Top 2: CH87', 'Top 5: CH39']

    fig, ax = plt.subplots(1, 2, figsize=(13, 6))

    ax[0].pie(x=df_top5_sdvp_plt['SUM_DUTIABLE_VALUE_PHP'], autopct='%1.2f%%',
              colors=[
              colors[v] for v in df_color['HSCODE_2'].value_counts().keys()])
    ax[0].legend(label1, loc='best', bbox_to_anchor=(1, 0.75))

    ax[1].pie(x=df_top5_sdvp_plt['SUM_DUTIES_AND_TAXES'], autopct='%1.2f%%',
              colors=[
              colors[v] for v in df_color['HSCODE_2'].value_counts().keys()])
    ax[1].legend(label2, loc='best', bbox_to_anchor=(1, 0.75))

    ax[0].set_title('SUM_DUTIABLE_VALUE_PHP', fontsize=8, y=-0.01)
    ax[1].set_title('SUM_DUTIES_AND_TAXES_PHP', fontsize=8, y=-0.01)

    plt.suptitle(
        'Top 5 chapters comparison: dutiable value and duties & taxes',
        fontsize=16)
    plt.tight_layout()
    plt.plot()


def yearly_top20_chapters():
    """Returns a plot of the top 20 chapters."""
    # Create DF for top 200 (DUTIABLE VALUE)
    df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')

    topn = 200
    df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))

    # Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
    df_top_yr_hs_dv = (df_top_200.reset_index())
    df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
        'SUM_DUTIABLE_VALUE_PHP']
        .rank('dense', ascending=False))
    df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
    df_rank_15_dv

    # Rank by [Month_Year and HSCODE] (DUTY TAXES)
    df_top_yr_hs_dt = (df_top_200.reset_index())
    df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
        'SUM_DUTIES_AND_TAXES']
        .rank('dense', ascending=False))
    df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
    df_rank_15_dt

    # Create a list of Top Items (DV & DT)
    list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
    list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()

    # Create a merged DF for identified top15 HSCODES (DV & DT)
    df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dv)]
    df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dt)]

    # Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_yr_hs_dv = (df_imports_merged_dv
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIABLE_VALUE_PHP'],
                                      ascending=[True, False]))
    df_top15_yr_hs_dt = (df_imports_merged_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))

    # Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))

    # PLOT ALL TOP 15 wrt DUTIABLE VALUE
    # Highlight Top 1 and Top 2

    ax = df_top15_hs_dv_pvt.drop(columns=['85', '27']).plot(
        figsize=(15, 10), color='#9c9a9d', alpha=0.3)
    df_top15_hs_dv_pvt['85'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
    df_top15_hs_dv_pvt['27'].plot(color='#fcd116', ax=ax, legend=True, lw=5)

    style = dict(size=10, color='#fcd116')
    ax.text('2017-02', .87E12, '           CHAPTER 27', ha='left', **style)
    style = dict(size=10, color='#0038a8')
    ax.text('2017-02', 1.55E12, 'CHAPTER 85', ha='left', **style)

    plt.suptitle(
        'Chapter 85 consistently leads the dutiable value since 2016',
        fontsize=18, color=c_blue)
    plt.title("Chapter 27, though rank 2 in overall dutiable value, \n"
              "has a fluctuating trend with a big dip from 2019 to 2020",
              fontsize=16)
    plt.show()

    # PLOT ALL TOP 15 wrt DUTY TAX
    # Highlight Top 1 and Top 2

    df_top15_hs_dt_pvt

    ax = df_top15_hs_dt_pvt.drop(columns=['85', '27']).plot(
        figsize=(15, 10), color='#9c9a9d', alpha=0.3)
    df_top15_hs_dt_pvt['85'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
    df_top15_hs_dt_pvt['27'].plot(color='#fcd116', ax=ax, legend=True, lw=5)

    plt.suptitle(
        'On the other hand, chapter 27 consistently leads assessed \n'
        'import duties & taxes since 2015', fontsize=18, color=c_blue)
    plt.title('Chapter 85 is rank 2 in assessed import duties & taxes',
              fontsize=16)

    style = dict(size=10, color='#fcd116')
    ax.text('2017-02', 1.38E11, 'CHAPTER 27', ha='left', **style)
    style = dict(size=10, color='#0038a8')
    ax.text('2017-02', .4E11, 'CHAPTER 85', ha='left', **style)

    plt.show()


def compare_top2_chapters():
    """Returns plots comparing top 2 chapters based on dutiable value
    against duties & taxes."""
    # Create DF for top 200 (DUTIABLE VALUE)
    df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')

    topn = 200
    df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))

    # Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
    df_top_yr_hs_dv = (df_top_200.reset_index())
    df_top_yr_hs_dv['rank'] = (df_top_yr_hs_dv.groupby(['MONTH_YEAR'])[
        'SUM_DUTIABLE_VALUE_PHP']
        .rank('dense', ascending=False))
    df_rank_15_dv = df_top_yr_hs_dv[df_top_yr_hs_dv['rank'] <= 15]
    df_rank_15_dv

    # Rank by [Month_Year and HSCODE] (DUTY TAXES)
    df_top_yr_hs_dt = (df_top_200.reset_index())
    df_top_yr_hs_dt['rank'] = (df_top_yr_hs_dt.groupby(['MONTH_YEAR'])[
        'SUM_DUTIES_AND_TAXES']
        .rank('dense', ascending=False))
    df_rank_15_dt = df_top_yr_hs_dt[df_top_yr_hs_dt['rank'] <= 15]
    df_rank_15_dt

    # Create a list of Top Items (DV & DT)
    list_top_hscode_dv = df_rank_15_dv['HSCODE_2'].unique().tolist()
    list_top_hscode_dt = df_rank_15_dt['HSCODE_2'].unique().tolist()

    # Create a merged DF for identified top15 HSCODES (DV & DT)
    df_imports_merged_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dv)]
    df_imports_merged_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_top_hscode_dt)]

    # Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_yr_hs_dv = (df_imports_merged_dv
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIABLE_VALUE_PHP'],
                                      ascending=[True, False]))
    df_top15_yr_hs_dt = (df_imports_merged_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nlargest(topn, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR', 'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))

    # Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_top15_hs_dv_pvt = (df_top15_yr_hs_dv.reset_index().pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    df_top15_hs_dt_pvt = (df_top15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(13, 6))
    df_top15_hs_dv_pvt['85'].plot(color='#0038a8', ax=ax1, legend=True, lw=5)
    df_top15_hs_dv_pvt['27'].plot(color='#fcd116', ax=ax1, legend=True, lw=5)
    ax1.legend(loc='upper left')
    df_top15_hs_dt_pvt['85'].plot(color='#0038a8', ax=ax2, legend=True, lw=5)
    df_top15_hs_dt_pvt['27'].plot(color='#fcd116', ax=ax2, legend=True, lw=5)
    plt.legend(loc='upper left')

    ax1.set_title(
        'Segmentation based on consistent behavior for the 2 chapters. \n '
        'Segmented into 2 [2019-2020] & [2020-2022]', fontsize=8)
    ax2.set_title('As opposed to the figure on the right, segmentations from '
                  'behavior is now 3. \n Duties & taxes for CH 85 '
                  'deviated from normal', fontsize=8)

    style = dict(size=10, color='#fcd116')
    ax1.text('2015-02', 0.6E12, '           CHAPTER 27', ha='left', **style)
    style = dict(size=10, color='#0038a8')
    ax1.text('2015-02', 1.4E12, '       CHAPTER 85', ha='left', **style)

    style = dict(size=10, color='#fcd116')
    ax2.text('2015-02', 01.5E11, '           CHAPTER 27', ha='left', **style)
    style = dict(size=10, color='#0038a8')
    ax2.text('2015-02', 0.5e11, '       CHAPTER 85', ha='left', **style)

    ax1.axvline(x=pd.to_datetime('2019-06-15'), color='k', ls='--')
    ax1.axvline(x=pd.to_datetime('2020-06-15'), color='k', ls='--')
    ax2.axvline(x=pd.to_datetime('2020-06-15'), color='k', ls='--')
    ax2.axvline(x=pd.to_datetime('2021-06-15'), color='k', ls='--')
    ax1.axvspan("2019", "2020", color="b", alpha=0.3)
    ax1.axvspan("2020", "2022", color="#fcd116", alpha=0.3)
    ax2.axvspan("2019", "2020", color="b", alpha=0.3)
    ax2.axvspan("2020", "2021", color="r", alpha=0.3)
    ax2.axvspan("2021", "2022", color="#fcd116", alpha=0.3)

    plt.suptitle(
        'There is an opposite trend behavior for chapter 27 in year 2020 to 2021',
        fontsize=16, color=c_blue)
    plt.tight_layout()
    plt.plot()


def bottom5_compare_chapters():
    """Returns a plot of the bottom 5 chapters based on dutiable value
    against duties and taxes.
    """
    # Create DF for top 200 (DUTIABLE VALUE)
    df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
    topn = 200
    df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))
    botn = 200
    df_bot_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nsmallest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))

    # Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
    df_bot_yr_hs_dv = (df_bot_200.reset_index())
    df_bot_yr_hs_dv['rank'] = (df_bot_yr_hs_dv.groupby(['MONTH_YEAR'])[
        'SUM_DUTIABLE_VALUE_PHP']
        .rank('dense', ascending=True))
    df_rank_bot_15_dv = df_bot_yr_hs_dv[df_bot_yr_hs_dv['rank'] <= 15]

    # Rank by [Month_Year and HSCODE] (DUTY TAXES)
    df_bot_yr_hs_dt = (df_bot_200.reset_index())
    df_bot_yr_hs_dt['rank'] = (df_bot_yr_hs_dt.groupby(['MONTH_YEAR'])[
        'SUM_DUTIES_AND_TAXES']
        .rank('dense', ascending=True))
    df_rank_bot_15_dt = df_bot_yr_hs_dt[df_bot_yr_hs_dt['rank'] <= 15]

    # Create a list of Bottom Items (DV & DT)
    list_bot_hscode_dv = df_rank_bot_15_dv['HSCODE_2'].unique().tolist()
    list_bot_hscode_dt = df_rank_bot_15_dt['HSCODE_2'].unique().tolist()

    # Create a merged DF for identified top15 HSCODES (DV & DT)
    df_imports_merged_bot_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_bot_hscode_dv)]
    df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_bot_hscode_dt)]

    # Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_bot15_yr_hs_dv = (df_imports_merged_bot_dv
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nsmallest(15, 'SUM_DUTIABLE_VALUE_PHP')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIABLE_VALUE_PHP'],
                                      ascending=[True, False]))
    df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nsmallest(botn, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))

    # Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_bot15_hs_dv_pvt = (df_bot15_yr_hs_dv.reset_index().pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))

    df_bot5_sdt = (df_imports_merged
                   .groupby('HSCODE_2')
                   .sum()
                   .nsmallest(5, 'SUM_DUTIES_AND_TAXES')
                   .sort_values(by=['SUM_DUTIES_AND_TAXES'],
                                ascending=[True])
                   .reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
                                    'SUM_DUTIES_AND_TAXES',
                                    'SUM_NET_MASS_KGS']]
    df_bot5_sdvp = (df_imports_merged
                    .groupby('HSCODE_2')
                    .sum()
                    .nsmallest(5, 'SUM_DUTIABLE_VALUE_PHP')
                    .sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
                                 ascending=[True])
                    .reset_index()
                    .set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
                                            'SUM_DUTIES_AND_TAXES',
                                            'SUM_NET_MASS_KGS']])
    df_colorb = df_bot5_sdvp.reset_index().drop(
        ['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
         'SUM_NET_MASS_KGS'], axis=1)
    df_colorbt = df_bot5_sdt.reset_index().drop(
        ['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
         'SUM_NET_MASS_KGS'], axis=1)
    colors = {'14': '#0038a8',
              '45': '#fcd116',
              '06': c_red,
              '46': '#a9a9a9',
              '97': '#878787',
              '01': '#878787',
              '50': '#a9a9a9'}

    # FINAL GRAPH COMPARISON

    # FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
    df_bot5_sdvp_plt = df_bot5_sdvp.reset_index().drop('HSCODE_2', axis=1)
    label1 = ['Bot 1: CH14', 'Bot 2: CH06',
              'Bot 3: CH46', 'Bot 4: CH45', 'Bot 5: CH97']

    # FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
    df_bot5_sdt_plt = df_bot5_sdt.reset_index().drop('HSCODE_2', axis=1)
    label2 = ['Bot 5: CH14', 'Bot 2: CH01',
              'Bot 4: CH50', 'Bot 3: CH45', 'Bot 1: CH06']

    fig, ax = plt.subplots(1, 2, figsize=(13, 6))

    ax[0].pie(x=df_bot5_sdvp_plt['SUM_DUTIABLE_VALUE_PHP'], autopct='%1.2f%%', colors=[
              colors[v] for v in df_colorb['HSCODE_2'].value_counts().keys()])
    ax[0].legend(label1, loc='best', bbox_to_anchor=(1, 0.75))

    ax[1].pie(x=df_bot5_sdvp_plt['SUM_DUTIES_AND_TAXES'], autopct='%1.2f%%', colors=[
              colors[v] for v in df_colorbt['HSCODE_2'].value_counts().keys()])
    ax[1].legend(label2, loc='best', bbox_to_anchor=(1, 0.75))

    ax[0].set_title('SUM_DUTIABLE_VALUE_PHP', fontsize=8, y=-0.01)
    ax[1].set_title('SUM_DUTIES_AND_TAXES', fontsize=8, y=-0.01)

    plt.suptitle(
        'Bottom 5 chapters comparison: Dutiable value and duties & taxes', fontsize=16)
    plt.tight_layout()
    plt.plot()


def yearly_bottom5_chapters():
    """Returns plots of yearly bottom 5 chapters based on dutiable value and
    duties and taxes.
    """
    # Create DF for top 200 (DUTIABLE VALUE)
    df_imports_merged = df_imports_agg.merge(df_chapters, on='HSCODE_2')
    topn = 200
    df_top_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))
    botn = 200
    df_bot_200 = (df_imports_merged.groupby([pd.Grouper(key='MONTH_YEAR',
                                                        freq='Y'),
                                             'HSCODE_2'])
                  .sum()
                  .nsmallest(topn, 'SUM_DUTIABLE_VALUE_PHP')
                  .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                               ascending=[True, False]))

    # Rank by [Month_Year and HSCODE] (DUTIABLE VALUE)
    df_bot_yr_hs_dv = (df_bot_200.reset_index())
    df_bot_yr_hs_dv['rank'] = (df_bot_yr_hs_dv.groupby(['MONTH_YEAR'])[
        'SUM_DUTIABLE_VALUE_PHP']
        .rank('dense', ascending=True))
    df_rank_bot_15_dv = df_bot_yr_hs_dv[df_bot_yr_hs_dv['rank'] <= 15]

    # Rank by [Month_Year and HSCODE] (DUTY TAXES)
    df_bot_yr_hs_dt = (df_bot_200.reset_index())
    df_bot_yr_hs_dt['rank'] = (df_bot_yr_hs_dt.groupby(['MONTH_YEAR'])[
        'SUM_DUTIES_AND_TAXES']
        .rank('dense', ascending=True))
    df_rank_bot_15_dt = df_bot_yr_hs_dt[df_bot_yr_hs_dt['rank'] <= 15]

    # Create a list of Bottom Items (DV & DT)
    list_bot_hscode_dv = df_rank_bot_15_dv['HSCODE_2'].unique().tolist()
    list_bot_hscode_dt = df_rank_bot_15_dt['HSCODE_2'].unique().tolist()

    # Create a merged DF for identified top15 HSCODES (DV & DT)
    df_imports_merged_bot_dv = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_bot_hscode_dv)]
    df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_bot_hscode_dt)]

    # Create a df gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_bot15_yr_hs_dv = (df_imports_merged_bot_dv
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nsmallest(15, 'SUM_DUTIABLE_VALUE_PHP')
                         .sort_values(by=['MONTH_YEAR', 'SUM_DUTIABLE_VALUE_PHP'],
                                      ascending=[True, False]))
    df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nsmallest(botn, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))

    # Pivot DF --> gb MONTH_YEAR & HSCODE_2 (DV & DT)
    df_bot15_hs_dv_pvt = (df_bot15_yr_hs_dv.reset_index().pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))

    df_bot5_sdt = (df_imports_merged
                   .groupby('HSCODE_2')
                   .sum()
                   .nsmallest(5, 'SUM_DUTIES_AND_TAXES')
                   .sort_values(by=['SUM_DUTIES_AND_TAXES'],
                                ascending=[True])
                   .reset_index())[['HSCODE_2', 'SUM_DUTIABLE_VALUE_PHP',
                                    'SUM_DUTIES_AND_TAXES',
                                    'SUM_NET_MASS_KGS']]
    df_bot5_sdvp = (df_imports_merged
                    .groupby('HSCODE_2')
                    .sum()
                    .nsmallest(5, 'SUM_DUTIABLE_VALUE_PHP')
                    .sort_values(by=['SUM_DUTIABLE_VALUE_PHP'],
                                 ascending=[True])
                    .reset_index()
                    .set_index('HSCODE_2')[['SUM_DUTIABLE_VALUE_PHP',
                                            'SUM_DUTIES_AND_TAXES',
                                            'SUM_NET_MASS_KGS']])
    df_colorb = df_bot5_sdvp.reset_index().drop(
        ['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
         'SUM_NET_MASS_KGS'], axis=1)
    df_colorbt = df_bot5_sdt.reset_index().drop(
        ['SUM_DUTIABLE_VALUE_PHP', 'SUM_DUTIES_AND_TAXES',
         'SUM_NET_MASS_KGS'], axis=1)
    colors = {'14': '#0038a8',
              '45': '#fcd116',
              '06': '#3cb043',
              '46': '#a9a9a9',
              '97': '#878787',
              '01': '#878787',
              '50': '#a9a9a9'}
    # FINAL GRAPH COMPARISON
    # FILTER TO TOP 5 HS CODE (SUM_DUTIABLE_VALUE_PHP)
    df_bot5_sdvp_plt = df_bot5_sdvp.reset_index().drop('HSCODE_2', axis=1)
    label1 = ['Bot 1: CH14', 'Bot 2: CH06',
              'Bot 3: CH46', 'Bot 4: CH45', 'Bot 5: CH97']

    # FILTER TO TOP 5 HS CODE (SUM_DUTIES_AND_TAXES)
    df_bot5_sdt_plt = df_bot5_sdt.reset_index().drop('HSCODE_2', axis=1)
    label2 = ['Bot 5: CH14', 'Bot 4: CH01',
              'Bot 3: CH50', 'Bot 2: CH45', 'Bot 1: CH06']

    # PLOT ALL TOP 15 wrt DUTIABLE VALUE
    # Highlight Top 1 and Top 2

    ax = df_bot15_hs_dv_pvt.drop(columns='14').plot(
        figsize=(15, 10), color='#9c9a9d', alpha=0.3)
    df_bot15_hs_dv_pvt['14'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
    style = dict(size=10, color='#0038a8')
    ax.text('2018', .8E7, 'CHAPTER 14', ha='left', **style)

    style = dict(size=10, color='#3cb043')
    ax.text('2016', 66003937, 'X CHAPTER 06', ha='left', **style)

    style = dict(size=10, color=c_red)
    ax.text('2022', 40623345, 'X CHAPTER 97', ha='left', **style)

    plt.legend(loc='upper right')
    plt.suptitle('Out of the 5 bottom chapters, it was only chapter 14 \n'
                 'that was present from 2016 to 2022 based on dutiable value.',
                 fontsize=18, color=c_blue)
    plt.title(
        'Chapter 06 & 97 were only present in 2016 and 2022, respectively',
        fontsize=16)
    plt.show()

    # Create DF for Bottom 15 wrt DT
    df_imports_merged_bot_dt = df_imports_merged[df_imports_merged['HSCODE_2'].isin(
        list_bot_hscode_dt)]

    # Create DF gb Month-Year & HS
    df_bot15_yr_hs_dt = (df_imports_merged_bot_dt
                         .groupby([pd.Grouper(key='MONTH_YEAR', freq='Y'),
                                   'HSCODE_2'])
                         .sum()
                         .nsmallest(15, 'SUM_DUTIES_AND_TAXES')
                         .sort_values(by=['MONTH_YEAR',
                                          'SUM_DUTIES_AND_TAXES'],
                                      ascending=[True, False]))
    # Create PVT table
    df_bot15_hs_dt_pvt = (df_bot15_yr_hs_dt.reset_index().pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='MONTH_YEAR',
        columns='HSCODE_2'))
    # Plot 1
    ax = df_bot15_hs_dt_pvt.drop(columns=['14', '50']).plot(
        figsize=(15, 10), color='#9c9a9d', alpha=0.3)
    df_bot15_hs_dt_pvt['14'].plot(color='#0038a8', ax=ax, legend=True, lw=5)
    df_bot15_hs_dt_pvt['50'].plot(color='#fcd116', ax=ax, legend=True, lw=5)

    style = dict(size=10, color='#0038a8')
    ax.text('2017', .46E6, 'CHAPTER 14', ha='left', **style)

    style = dict(size=10, color='#fcd116')
    ax.text('2016', .4E6, 'CHAPTER 50', ha='left', **style)

    plt.legend(loc='upper right')
    plt.suptitle('In terms of duties & taxes, instead of chapter 14, \n'
                 'it was chapter 50 that was present from 2016 to 2023',
                 fontsize=18, color=c_blue)
    plt.title(
        'Chapter 14 based on duties & taxes only had values from 2018 to 2020',
        fontsize=16)

    plt.show()


def prrd_top_countries():
    """Returns plots of top 10 countries by dutiable value and by
    duties and taxes."""
    # Create data frames for top 10 by country_export
    sql = """
    SELECT *
    FROM summary_year_cntry
    """

    df_imports_agg_by_yr_cntry = pd.read_sql(sql, conn)

    prrd_country = (df_imports_agg_by_yr_cntry[(
        df_imports_agg_by_yr_cntry['MONTH_YEAR'] >= '2016-06-01') &
        (df_imports_agg_by_yr_cntry['MONTH_YEAR'] <= '2022-06-30')])

    n = 10
    SDV_top10 = (prrd_country
                 .groupby(['COUNTRY_EXPORT'])['SUM_DUTIABLE_VALUE_PHP']
                 .sum().sort_values(ascending=False)[:n].reset_index())

    n = 10
    SDT_top10 = (prrd_country
                 .groupby(['COUNTRY_EXPORT'])['SUM_DUTIES_AND_TAXES']
                 .sum().sort_values(ascending=False)[:n].reset_index())

    # Plot based on Dutiable Value
    y = SDV_top10['SUM_DUTIABLE_VALUE_PHP']
    x = SDV_top10['COUNTRY_EXPORT']

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    ax.barh(x, y, color=['#0038a8', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray'])
    ax.invert_yaxis()
    ax.set_xlabel('Dutiable value in trillion PHP')
    ax.set_title("China is the top country based on dutiable value "
                 "during PRRD's term")
    ax.yaxis.set_label_position("right")
    ax.yaxis.tick_right()
    plt.show()

    # Plot based on duties and taxes
    y = SDT_top10['SUM_DUTIES_AND_TAXES']
    x = SDT_top10['COUNTRY_EXPORT']

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    ax.barh(x, y, color=['#0038a8', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray', 'lightgray', 'lightgray',
                         'lightgray', 'lightgray'])
    ax.invert_yaxis()
    ax.set_xlabel('Duties and taxes in billion PHP')
    ax.set_title("China is also the highest paying country based on duties and "
                 "taxes during PRRD's term")
    ax.yaxis.set_label_position("right")
    ax.yaxis.tick_right()
    plt.show()

    # Show amount & % to total of DV and DT during PRRD's Term
    total_SDV = prrd_country['SUM_DUTIABLE_VALUE_PHP'].sum()
    total_SDT = prrd_country['SUM_DUTIES_AND_TAXES'].sum()

    print(
        f"sum of dutiable value during PRRD's Term   = {total_SDV/1e12:,.2f} trillion")
    print(
        f"sum of duties and taxes during PRRD's Term = {total_SDT/1e12:,.2f} trillion")

    SDV_top10['% OF TOTAL'] = (
        SDV_top10['SUM_DUTIABLE_VALUE_PHP']/total_SDV)*100
    display(SDV_top10)

    SDT_top10['% OF TOTAL'] = (SDT_top10['SUM_DUTIES_AND_TAXES']/total_SDT)*100
    display(SDT_top10)


def yearly_top15_countries_dv():
    """Returns a plot of yearly top 15 countries by dutiable value."""
    # Plot top countries by dutiable value
    topn = 100
    topn_highest = (df_summary_year_cntry.groupby(
        [pd.Grouper(key='MONTH_YEAR', freq='Y'),
         'COUNTRY_EXPORT'])
        .sum()
        .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP'))

    topn_highest_yr_cntry = topn_highest.sort_index().reset_index()

    top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()

    paramlist = r'?'

    for i in range(1, len(top_countries)):
        paramlist = paramlist + r', ?'

    sql = f"""
    SELECT substr(month_year,1,4) as YR,
           country_export,
           sum(SUM_DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
           sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
           sum(SUM_NET_MASS_KGS) as SUM_NET_MASS_KGS
    FROM   summary_year_cntry
    WHERE  country_export in ({paramlist})
    GROUP BY YR, country_export
    """

    top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)

    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='YR',
        columns='COUNTRY_EXPORT'))

    # Plot
    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
    list_exclude = ['UNITED STATES', 'CHINA']
    list_cntry = [c for c in top_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)

    ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
            color=c_red, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'UNITED STATES'], linestyle='dashed',
            color=c_green, linewidth=2,
            zorder=5)

    style = dict(size=10, color=c_red)
    ax.text('2015', 550e9, 'CHINA', ha='center',
            **style)
    style = dict(size=10, color=c_green)
    ax.text('2015', 340e9, 'USA', ha='right',
            **style)

    ax.yaxis.set_major_formatter(billion_formatter)
    ax.set_xlabel('YEAR')
    ax.set_ylabel('DUTIABLE VALUE (in Billions PHP)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = "China importations way above USA\nimportations from 2015 to 2022."

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)

    plt.show()


def top15_countries_increases_dv():
    """Returns a plot of top 15 countries based on dutiable value which 
    highlights only countries that showed an increase in 2020."""
    # Plot top countries by dutiable value
    topn = 100
    topn_highest = (df_summary_year_cntry.groupby(
        [pd.Grouper(key='MONTH_YEAR', freq='Y'),
         'COUNTRY_EXPORT'])
        .sum()
        .nlargest(topn, 'SUM_DUTIABLE_VALUE_PHP'))

    topn_highest_yr_cntry = topn_highest.sort_index().reset_index()

    top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()

    paramlist = r'?'

    for i in range(1, len(top_countries)):
        paramlist = paramlist + r', ?'

    sql = f"""
    SELECT substr(month_year,1,4) as YR,
           country_export,
           sum(SUM_DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
           sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
           sum(SUM_NET_MASS_KGS) as SUM_NET_MASS_KGS
    FROM   summary_year_cntry
    WHERE  country_export in ({paramlist})
    GROUP BY YR, country_export
    """

    top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)

    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_DUTIABLE_VALUE_PHP',
        index='YR',
        columns='COUNTRY_EXPORT'))

    # Plot country_export that showed increase in importation
    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
    list_exclude = ['CHINA', 'JAPAN', 'KOREA', 'HONG KONG']
    list_cntry = [c for c in top_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)

    ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
            color=c_red, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'JAPAN'],
            color=c_yellow, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'KOREA'],
            color=c_blue, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'HONG KONG'],
            color=c_black, linewidth=2,
            zorder=5)

    ax.yaxis.set_major_formatter(billion_formatter)

    ax.text('2020', 1300e9, 'CHINA', ha='center', zorder=5,
            size=10, color=c_red)
    ax.text('2020', 900e9, 'JAPAN', ha='center', zorder=4,
            size=10, color=c_yellow)
    ax.text('2020', 600e9, 'KOREA', ha='center', zorder=3,
            size=10, color=c_blue)
    ax.text('2020', 400e9, 'HONG KONG', ha='center', zorder=2,
            size=10, color=c_black)

    ax.set_xlabel('YEAR')
    ax.set_ylabel('DUTIABLE VALUE (in Billions PHP)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = """Increased importations from four countries in 2020"""

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)
    plt.show()


def yearly_top15_countries_dt():
    """Returns a plot of top 15 countries based on duties and taxes."""
    # Plot top countries by duties and taxes
    topn = 100
    topn_highest = (df_summary_year_cntry.groupby(
        [pd.Grouper(key='MONTH_YEAR', freq='Y'),
         'COUNTRY_EXPORT'])
        .sum()
        .nlargest(topn, 'SUM_DUTIES_AND_TAXES'))

    topn_highest_yr_cntry = topn_highest.sort_index().reset_index()

    top_countries = topn_highest_yr_cntry['COUNTRY_EXPORT'].unique().tolist()

    paramlist = r'?'

    for i in range(1, len(top_countries)):
        paramlist = paramlist + r', ?'

    sql = f"""
    SELECT substr(month_year,1,4) as YR,
           country_export,
           sum(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES
    FROM   summary_year_cntry
    WHERE  country_export in ({paramlist})
    GROUP BY YR, country_export
    """
    top_countries_per_yr = pd.read_sql(sql, conn, params=top_countries)

    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='YR',
        columns='COUNTRY_EXPORT'))

    # Plot
    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
    list_exclude = ['UNITED STATES', 'CHINA']
    list_cntry = [c for c in top_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray)

    ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
            color=c_red, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'UNITED STATES'], linestyle='dashed',
            color=c_green, linewidth=2,
            zorder=5)

    ax.text('2015', 80e9, 'CHINA', ha='center',
            size=10, color=c_red)

    ax.text('2015', 25e9, 'USA', ha='right',
            size=10, color=c_green)

    ax.yaxis.set_major_formatter(billion_formatter)
    ax.set_xlabel('YEAR')
    ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = """From 2015 to 2022, China import
    duties and taxes (DT) are far above of USA."""

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)

    plt.show()

    # Plot top contributors based on duties and taxes
    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)
    list_exclude = ['CHINA', 'JAPAN', 'KOREA', 'HONG KONG']
    list_cntry = [c for c in top_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray,
            label=list_cntry)

    ax.plot(df_top_cntry_pvt.loc[:, 'CHINA'],
            color=c_red, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'JAPAN'],
            color=c_yellow, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'KOREA'],
            color=c_blue, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'HONG KONG'],
            color=c_black, linewidth=2,
            zorder=5)

    ax.yaxis.set_major_formatter(billion_formatter)

    ax.text('2020', 195e9, 'CHINA', ha='left', zorder=5,
            size=10, color=c_red)
    ax.text('2020', 100e9, 'JAPAN', ha='left', zorder=4,
            size=10, color=c_yellow)
    ax.text('2020', 130e9, 'KOREA', ha='left', zorder=3,
            size=10, color=c_blue)
    ax.text('2020', 35e9, 'HONG KONG', ha='left', zorder=2,
            size=10, color=c_black)

    ax.set_xlabel('YEAR')
    ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')

    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    # place a text box in upper left in axes coords
    textstr = """Importations from China, Korea and Japan are
    the top contributors of Duties and Taxes since 2020"""

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)

    plt.show()


def rice_volume():
    """Returns a plot of yearly rice importation based on volume, highlighting
    the top 2 countries."""
    # Plot yearly trend based on volume
    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_NETMASS_MT',
        index='YR',
        columns='COUNTRY_EXPORT'))

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    list_exclude = ['VIET NAM', 'THAILAND']
    list_cntry = [c for c in list_top_rice_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2,
            label=list_cntry, color=c_gray)

    ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
            zorder=5)

    ax.text('2015', 1e6, 'VIET NAM', ha='center',
            size=10, color=c_blue)
    ax.text('2015', 0.5e6, 'THAILAND', ha='center',
            size=10, color=c_red)

    ax.yaxis.set_major_formatter(million_formatter)
    ax.set_xlabel('YEAR')
    ax.set_ylabel('VOLUME (in Metric Tons)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = """Since 2019, the volume of rice importation
    from Viet Nam more than doubled from 0.8 million MT
    to 2.5 million Metric Tons."""

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)
    plt.grid(axis='y')
    plt.show()


def rice_dt():
    """Returns a plot of rice importation based on duties and taxes with 
    the top 2 countries highlighted."""
    # Plot based on duties and taxes
    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_DUTIES_AND_TAXES',
        index='YR',
        columns='COUNTRY_EXPORT'))

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    list_exclude = ['VIET NAM', 'THAILAND']
    list_cntry = [c for c in list_top_rice_countries if c not in list_exclude]

    ax.plot(df_top_cntry_pvt[list_cntry], linewidth=2, color=c_gray,
            label=list_cntry)

    ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
            zorder=5)

    ax.text('2022', 14e9, 'VIET NAM', ha='center',
            size=10, color=c_blue)
    ax.text('2022', 1.5e9, 'THAILAND', ha='center',
            size=10, color=c_red)

    ax.yaxis.set_major_formatter(billion_formatter)

    ax.set_xlabel('YEAR')
    ax.set_ylabel('DUTIES AND TAXES (in Billions PHP)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = """Since 2019, the duties and taxes of rice importation
    from Viet Nam increased 2.5 times."""

    ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)
    # ax.legend(loc='center right')
    plt.grid(axis='y')
    plt.show()


def rice_costperkilo():
    """Returns a plot of yearly imported rice's cost per kilo by country."""
    # Plot effective cost of rice per kg

    top_countries_per_yr = pd.read_sql(
        sql, conn, params=list_top_rice_countries)

    df_top_cntry_pvt = (top_countries_per_yr.pivot(
        values='SUM_DV_DT_PER_KGS',
        index='YR',
        columns='COUNTRY_EXPORT'))

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    ax.plot(df_top_cntry_pvt.loc[:, 'VIET NAM'], color=c_blue, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc[:, 'THAILAND'], color=c_red, linewidth=2,
            zorder=5)

    ax.plot(df_top_cntry_pvt.loc['2019':, 'MYANMAR (former BURMA)'], color=c_black,
            linewidth=2, zorder=5)

    ax.plot(df_top_cntry_pvt.loc['2017':, 'PAKISTAN'], color=c_yellow, linewidth=2,
            zorder=5)

    ax.text('2015', 20.2, 'VIET NAM', ha='center',
            size=10, color=c_blue)
    ax.text('2015', 21.3, 'THAILAND', ha='center',
            size=10, color=c_red)
    ax.text('2017', 21.8, 'PAKISTAN', ha='left',
            size=10, color=c_yellow)
    ax.text('2019', 19.7, 'MYANMAR', ha='left',
            size=10, color=c_black)

    ax.set_xlabel('YEAR')
    ax.set_ylabel('IMPORTATION COST PER KILOGRAM (in PHP)')

    # place a text box in upper left in axes coords
    props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)

    textstr = """The cost of rice importation ranged
    between 20 to 28 pesos per kilogram."""

    ax.text(0.05, 0.85, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', bbox=props)
    plt.grid(axis='y')
    plt.show()


def rice_per_capita():
    """Returns a plot of yearly trend in rice volume vs. the change
    in population between 2015 and 2020."""

    # Plot
    df_annual_rice_import = top_countries_per_yr.groupby(
        'YR').sum().reset_index()

    df_annual_rice_import['DT_PER_MT'] = (df_annual_rice_import['SUM_DUTIES_AND_TAXES'] /
                                          df_annual_rice_import['SUM_NETMASS_MT'])

    df_annual_rice_import.set_index(['YR'], inplace=True)

    # Add the Population fields
    df_popn = pd.melt(df_census.query("LOCATION == 'PHILIPPINES'"),
                      id_vars='LOCATION', value_name='POPULATION',
                      var_name='YEAR').drop('LOCATION', axis=1)
    df_popn['YEAR'] = df_popn['YEAR'].str[-6:-2]
    df_popn.rename(columns={'YEAR': 'YR'}, inplace=True)

    df_rice_popn = pd.merge(df_annual_rice_import,
                            df_popn, how='left', on='YR')
    df_rice_popn.set_index(['YR'], inplace=True)

    df_rice_popn['KGS_PER_POPN'] = (df_rice_popn['SUM_NETMASS_MT'] * 1000 /
                                    df_rice_popn['POPULATION'])

    fig, ax = plt.subplots(figsize=(10, 5), dpi=200)

    ax.plot(df_rice_popn['SUM_NETMASS_MT'], linewidth=2,
            label='Rice Volume', zorder=1)

    ax2 = ax.twinx()
    ax2 = df_rice_popn['POPULATION'].plot(kind='bar', linewidth=2,
                                          secondary_y=True, zorder=2,
                                          label='Population', color=c_yellow,
                                          alpha=0.5)

    ax.yaxis.set_major_formatter(million_formatter)
    ax2.yaxis.set_major_formatter(million_formatter)

    ax.set_xlabel('YEAR')
    ax.set_ylabel('VOLUME (in Metric Tons)')
    ax2.set_ylabel('POPULATION')
    ax.set_ylim(0, 4e6)
    ax2.set_ylim(100e6, 110e6)
    ax2.legend(loc='upper right')
    ax.legend(loc='upper left')
    plt.grid(axis='y')
    plt.show()

Data Extraction¶

The customs and census datasets were obtained via a jojie-collected public dataset (directories: /mnt/data/public/customs and /mnt/data/public/census).

The raw data are all in xlsx file format which were saved to pickle files and sqlite3 databases.

The detailed steps performed, including the relevant documents used and created are documented below:

  • Step 1: Obtained the monthly importation excel files from 2015 to 2022 and saved those datasets to their respective yearly pickle files.
Jupyter Notebooks Pickle Files created
customs-sqllite-2015.ipynb 2015_imports.pkl
customs-sqllite-2016.ipynb 2016_imports.pkl
customs-sqllite-2017.ipynb 2017_imports.pkl
customs-sqllite-2018.ipynb 2018_imports.pkl
customs-sqllite-2019.ipynb 2019_imports.pkl
customs-sqllite-2020.ipynb 2020_imports.pkl
customs-sqllite-2021.ipynb 2021_imports.pkl
customs-sqllite-2022.ipynb 2022_imports.pkl
  • Step 2: Used the yearly '-imports.pkl' files to extract all the column names from the customs files and created a dictionary to map those extracted names to a standard column naming convention.
Jupyter Notebook Excel File created
colname_extraction.ipynb Column_Dict.xlsx
  • Step 3: Used the yearly '-imports.pkl' files from step 1 to create relevant tables and insert values in each year's '-customs.db'
Jupyter Notebooks SQLite3 Databases created
customs-create-sqllite-2015.ipynb 2015-customs.db
customs-create-sqllite-2016.ipynb 2016-customs.db
customs-create-sqllite-2017.ipynb 2017-customs.db
customs-create-sqllite-2018.ipynb 2018-customs.db
customs-create-sqllite-2019.ipynb 2019-customs.db
customs-create-sqllite-2020.ipynb 2020-customs.db
customs-create-sqllite-2021.ipynb 2021-customs.db
customs-create-sqllite-2022.ipynb 2022-customs.db
  • Step 4: Obtained the census excel file for 2020 (includes 2015 population) and saved such file to 'census.db' database.
Jupyter Notebook SQLite3 database created Excel file created
census-create-sqllite.ipynb census.db census.xlsx
  • Step 5: Combined the yearly imports database contents, created in step 3, into a single database named 'imports_combined.db'
Jupyter Notebook SQLite3 database created Excel file created
imports_combined.ipynb imports_combined.db imports_rowcounts.xlsx
  • Step 6: Inserted Reference Tables (Reference-Tables.ipynb) to 'imports_combined.db' database.

  • Step 7: Place the above files in the same directory.

Data Cleaning and Processing¶

Establish SQLite3 database connection¶

In [3]:
# Connect to SQL DB containing Philippine importation and census data
conn = sqlite3.connect('imports_combined.db')
conn2 = sqlite3.connect('census.db')

Customs Importation Data¶

In [4]:
# Read the sql query below and save it to a data frame

# sql = """
# CREATE TABLE summary AS
# SELECT MONTH_YEAR,
#     substr('0000000000'||HS_CODE, -11, 11) as HS_CODE,
#     PREFERENTIAL_CODE,
#     SUM(DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
#     SUM(VAT_BASE) as SUM_VAT_BASE,
#     SUM(VAT_PAID) as SUM_VAT_PAID,
#     SUM(DUTY_PAID) as SUM_DUTY_PAID,
#     SUM(DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
#     SUM(EXCISE_ADVALOREM_PAID) as SUM_EXCISE_ADVALOREM_PAID,
#     SUM(NET_MASS_KGS) as SUM_NET_MASS_KGS,

#     AVG(DUTIABLE_VALUE_PHP) as AVG_DUTIABLE_VALUE_PHP,
#     AVG(VAT_BASE) as AVG_VAT_BASE,
#     AVG(VAT_PAID) as AVG_VAT_PAID,
#     AVG(DUTY_PAID) as AVG_DUTY_PAID,
#     AVG(DUTIES_AND_TAXES) as AVG_DUTIES_AND_TAXES,
#     AVG(EXCISE_ADVALOREM_PAID) as AVG_EXCISE_ADVALOREM_PAID,
#     AVG(NET_MASS_KGS) as AVG_NET_MASS_KGS,

#     MIN(DUTIABLE_VALUE_PHP) as MIN_DUTIABLE_VALUE_PHP,
#     MIN(VAT_BASE) as MIN_VAT_BASE,
#     MIN(VAT_PAID) as MIN_VAT_PAID,
#     MIN(DUTY_PAID) as MIN_DUTY_PAID,
#     MIN(DUTIES_AND_TAXES) as MIN_DUTIES_AND_TAXES,
#     MIN(EXCISE_ADVALOREM_PAID) as MIN_EXCISE_ADVALOREM_PAID,
#     MIN(NET_MASS_KGS) as MIN_NET_MASS_KGS,

#     MAX(DUTIABLE_VALUE_PHP) as MAX_DUTIABLE_VALUE_PHP,
#     MAX(VAT_BASE) as MAX_VAT_BASE,
#     MAX(VAT_PAID) as MAX_VAT_PAID,
#     MAX(DUTY_PAID) as MAX_DUTY_PAID,
#     MAX(DUTIES_AND_TAXES) as MAX_DUTIES_AND_TAXES,
#     MAX(EXCISE_ADVALOREM_PAID) as MAX_EXCISE_ADVALOREM_PAID,
#     MAX(NET_MASS_KGS) as MAX_NET_MASS_KGS,

#     COUNT(*) as COUNT_ROWS
# FROM imports
# GROUP BY MONTH_YEAR, HS_CODE, PREFERENTIAL_CODE
# """

# df_imports_agg = pd.read_sql(sql, conn)
In [5]:
# Add country_export to the db

# sql = """
# CREATE TABLE summary_year_cntry AS
# SELECT MONTH_YEAR,
#     COUNTRY_EXPORT,
#     SUM(DUTIABLE_VALUE_PHP) as SUM_DUTIABLE_VALUE_PHP,
#     SUM(VAT_BASE) as SUM_VAT_BASE,
#     SUM(VAT_PAID) as SUM_VAT_PAID,
#     SUM(DUTY_PAID) as SUM_DUTY_PAID,
#     SUM(DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
#     SUM(EXCISE_ADVALOREM_PAID) as SUM_EXCISE_ADVALOREM_PAID,
#     SUM(NET_MASS_KGS) as SUM_NET_MASS_KGS,

#     AVG(DUTIABLE_VALUE_PHP) as AVG_DUTIABLE_VALUE_PHP,
#     AVG(VAT_BASE) as AVG_VAT_BASE,
#     AVG(VAT_PAID) as AVG_VAT_PAID,
#     AVG(DUTY_PAID) as AVG_DUTY_PAID,
#     AVG(DUTIES_AND_TAXES) as AVG_DUTIES_AND_TAXES,
#     AVG(EXCISE_ADVALOREM_PAID) as AVG_EXCISE_ADVALOREM_PAID,
#     AVG(NET_MASS_KGS) as AVG_NET_MASS_KGS,

#     MIN(DUTIABLE_VALUE_PHP) as MIN_DUTIABLE_VALUE_PHP,
#     MIN(VAT_BASE) as MIN_VAT_BASE,
#     MIN(VAT_PAID) as MIN_VAT_PAID,
#     MIN(DUTY_PAID) as MIN_DUTY_PAID,
#     MIN(DUTIES_AND_TAXES) as MIN_DUTIES_AND_TAXES,
#     MIN(EXCISE_ADVALOREM_PAID) as MIN_EXCISE_ADVALOREM_PAID,
#     MIN(NET_MASS_KGS) as MIN_NET_MASS_KGS,

#     MAX(DUTIABLE_VALUE_PHP) as MAX_DUTIABLE_VALUE_PHP,
#     MAX(VAT_BASE) as MAX_VAT_BASE,
#     MAX(VAT_PAID) as MAX_VAT_PAID,
#     MAX(DUTY_PAID) as MAX_DUTY_PAID,
#     MAX(DUTIES_AND_TAXES) as MAX_DUTIES_AND_TAXES,
#     MAX(EXCISE_ADVALOREM_PAID) as MAX_EXCISE_ADVALOREM_PAID,
#     MAX(NET_MASS_KGS) as MAX_NET_MASS_KGS,

#     COUNT(*) as COUNT_ROWS
# FROM imports
# GROUP BY MONTH_YEAR, COUNTRY_EXPORT
# ORDER BY MONTH_YEAR, COUNTRY_EXPORT
# """

# conn.execute(sql)
In [6]:
# Save the file to a pickle file for shorter processing time when we access
# # the same data moving forward

# df_imports_agg.to_pickle('df_imports_agg.pkl')
In [7]:
# Remove commas in country_export column
# sql = """
# UPDATE summary
#    SET country_export = substr(country_export, 1, instr(country_export,',')-1)
# WHERE country_export like '%,%'
# """

# conn.execute(sql)
In [8]:
# Load pickle file, read-only

with open('df_imports_agg.pkl', 'rb') as file:
    df_imports_agg = pickle.load(file)
In [9]:
# Preview data
display(df_imports_agg.head())

# Get the dimensions of the df
print(f'Number of rows: {df_imports_agg.shape[0]}\n'
      f'Number of columns: {df_imports_agg.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
print(df_imports_agg.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_imports_agg.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_imports_agg.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(df_imports_agg.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(df_imports_agg.isna().sum() / (len(df_imports_agg))*100)
MONTH_YEAR HS_CODE PREFERENTIAL_CODE SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS ... MIN_EXCISE_ADVALOREM_PAID MIN_NET_MASS_KGS MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID MAX_NET_MASS_KGS COUNT_ROWS
0 2015-1 10019019000 None 2,413,111,061.00 2,440,378,069.00 0.00 0.00 0.00 NaN 153,329,722.00 ... NaN 1000787.0 369,253,676.00 372,889,558.00 0.00 0.00 0.00 NaN 8006296.0 20
1 2015-1 10019099000 None 1,448,184,761.00 1,571,805,522.00 0.00 101,372,917.00 101,372,917.00 NaN 125,329,720.00 ... NaN 0.0 216,549,708.00 234,706,251.00 0.00 15,158,479.00 15,158,479.00 NaN 9460325.0 32
2 2015-1 10019099000 AIFTA 18,672,091.00 19,794,256.00 0.00 746,881.00 746,881.00 NaN 2,005,240.00 ... NaN 229540.0 6,410,765.00 6,789,302.00 0.00 256,430.00 256,430.00 NaN 691410.0 5
3 2015-1 10019099000 ANFTA 262,946,606.00 266,354,909.00 0.00 0.00 0.00 NaN 20,094,214.00 ... NaN 16129544.0 206,744,629.00 209,332,028.00 0.00 0.00 0.00 NaN 99390.0 13
4 2015-1 10019919000 None 77,393,616.00 78,178,366.00 9,381,401.00 0.00 9,381,401.00 NaN 4,470,871.00 ... NaN 406149.0 17,263,204.00 17,433,569.00 2,092,028.00 0.00 2,092,028.00 NaN 989386.0 7

5 rows × 32 columns

Number of rows: 1324823
Number of columns: 32

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1324823 entries, 0 to 1324822
Data columns (total 32 columns):
 #   Column                     Non-Null Count    Dtype  
---  ------                     --------------    -----  
 0   MONTH_YEAR                 1324823 non-null  object 
 1   HS_CODE                    1324823 non-null  object 
 2   PREFERENTIAL_CODE          671722 non-null   object 
 3   SUM_DUTIABLE_VALUE_PHP     1324823 non-null  float64
 4   SUM_VAT_BASE               1324823 non-null  float64
 5   SUM_VAT_PAID               1324823 non-null  float64
 6   SUM_DUTY_PAID              1324823 non-null  float64
 7   SUM_DUTIES_AND_TAXES       1324823 non-null  float64
 8   SUM_EXCISE_ADVALOREM_PAID  1105652 non-null  float64
 9   SUM_NET_MASS_KGS           1324806 non-null  float64
 10  AVG_DUTIABLE_VALUE_PHP     1324823 non-null  float64
 11  AVG_VAT_BASE               1324823 non-null  float64
 12  AVG_VAT_PAID               1324823 non-null  float64
 13  AVG_DUTY_PAID              1324823 non-null  float64
 14  AVG_DUTIES_AND_TAXES       1324823 non-null  float64
 15  AVG_EXCISE_ADVALOREM_PAID  1105652 non-null  float64
 16  AVG_NET_MASS_KGS           1324806 non-null  float64
 17  MIN_DUTIABLE_VALUE_PHP     1324823 non-null  float64
 18  MIN_VAT_BASE               1324823 non-null  float64
 19  MIN_VAT_PAID               1324823 non-null  float64
 20  MIN_DUTY_PAID              1324823 non-null  float64
 21  MIN_DUTIES_AND_TAXES       1324823 non-null  float64
 22  MIN_EXCISE_ADVALOREM_PAID  1105652 non-null  float64
 23  MIN_NET_MASS_KGS           1324806 non-null  object 
 24  MAX_DUTIABLE_VALUE_PHP     1324823 non-null  float64
 25  MAX_VAT_BASE               1324823 non-null  float64
 26  MAX_VAT_PAID               1324823 non-null  float64
 27  MAX_DUTY_PAID              1324823 non-null  float64
 28  MAX_DUTIES_AND_TAXES       1324823 non-null  float64
 29  MAX_EXCISE_ADVALOREM_PAID  1105652 non-null  float64
 30  MAX_NET_MASS_KGS           1324806 non-null  object 
 31  COUNT_ROWS                 1324823 non-null  int64  
dtypes: float64(26), int64(1), object(5)
memory usage: 323.4+ MB
None

Summary statistics for numerical columns:

SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS AVG_DUTIABLE_VALUE_PHP AVG_VAT_BASE AVG_VAT_PAID ... MIN_DUTY_PAID MIN_DUTIES_AND_TAXES MIN_EXCISE_ADVALOREM_PAID MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID COUNT_ROWS
count 1,324,823.00 1,324,823.00 1,324,823.00 1,324,823.00 1,324,823.00 1,105,652.00 1,324,806.00 1,324,823.00 1,324,823.00 1,324,823.00 ... 1,324,823.00 1,324,823.00 1,105,652.00 1,324,823.00 1,324,823.00 1,324,823.00 1,324,823.00 1,324,823.00 1,105,652.00 1,324,823.00
mean 34,929,634.89 36,822,070.21 3,633,220.59 626,018.18 4,113,051.56 677,901.78 717,764.97 2,624,132.48 2,748,433.15 366,429.17 ... 16,009.30 143,363.82 17,917.48 8,728,100.18 9,340,447.21 949,754.37 158,984.01 1,025,553.28 174,129.59 25.13
std 732,322,462.07 775,984,732.53 79,093,379.28 31,437,123.11 98,258,365.54 36,661,977.61 19,769,234.28 47,541,731.53 49,670,521.35 6,466,321.70 ... 423,020.96 2,594,409.27 942,233.57 570,526,018.53 615,282,651.97 57,513,699.75 30,281,140.47 71,766,654.04 18,925,504.66 126.76
min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ... 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00
25% 166,852.50 179,316.38 24,841.60 0.00 8,251.00 0.00 377.12 59,091.00 64,159.46 8,235.97 ... 0.00 0.00 0.00 115,299.68 123,736.50 17,635.74 0.00 6,465.77 0.00 1.00
50% 1,668,926.68 1,756,291.00 191,009.00 3.00 149,662.00 0.00 7,460.18 304,490.32 322,282.50 37,755.95 ... 0.00 799.16 0.00 882,473.00 924,721.00 105,762.00 3.00 87,547.00 0.00 3.00
75% 9,419,254.94 9,873,365.88 1,012,921.25 56,728.00 914,510.50 0.00 52,630.51 1,120,654.80 1,163,528.08 140,875.29 ... 124.00 20,126.82 0.00 3,424,216.00 3,570,642.91 402,865.00 28,132.89 357,465.67 0.00 13.00
max 384,390,862,552.01 384,875,563,099.94 46,184,983,855.11 34,208,739,859.72 65,715,485,655.68 14,353,591,500.00 3,410,091,691.00 31,593,760,764.57 33,929,375,520.14 4,294,389,926.00 ... 139,739,494.00 648,718,679.46 210,221,920.00 242,616,739,570.34 262,056,064,649.71 31,446,727,757.96 34,144,111,833.19 65,590,839,591.15 13,499,424,000.00 13,505.00

8 rows × 27 columns

Summary statistics for object(string) columns:

count unique top freq
MONTH_YEAR 1324823 93 2015-9 132186
HS_CODE 1324823 17564 84799040000 3623
PREFERENTIAL_CODE 671722 90244 ACFTA 289827
MIN_NET_MASS_KGS 1324806 196538 0.01 84932
MAX_NET_MASS_KGS 1324806 229072 1.0 12932
Check for null columns:

MONTH_YEAR                        0
HS_CODE                           0
PREFERENTIAL_CODE            653101
SUM_DUTIABLE_VALUE_PHP            0
SUM_VAT_BASE                      0
SUM_VAT_PAID                      0
SUM_DUTY_PAID                     0
SUM_DUTIES_AND_TAXES              0
SUM_EXCISE_ADVALOREM_PAID    219171
SUM_NET_MASS_KGS                 17
AVG_DUTIABLE_VALUE_PHP            0
AVG_VAT_BASE                      0
AVG_VAT_PAID                      0
AVG_DUTY_PAID                     0
AVG_DUTIES_AND_TAXES              0
AVG_EXCISE_ADVALOREM_PAID    219171
AVG_NET_MASS_KGS                 17
MIN_DUTIABLE_VALUE_PHP            0
MIN_VAT_BASE                      0
MIN_VAT_PAID                      0
MIN_DUTY_PAID                     0
MIN_DUTIES_AND_TAXES              0
MIN_EXCISE_ADVALOREM_PAID    219171
MIN_NET_MASS_KGS                 17
MAX_DUTIABLE_VALUE_PHP            0
MAX_VAT_BASE                      0
MAX_VAT_PAID                      0
MAX_DUTY_PAID                     0
MAX_DUTIES_AND_TAXES              0
MAX_EXCISE_ADVALOREM_PAID    219171
MAX_NET_MASS_KGS                 17
COUNT_ROWS                        0
dtype: int64
Display % of null columns:

MONTH_YEAR                   0.00
HS_CODE                      0.00
PREFERENTIAL_CODE           49.30
SUM_DUTIABLE_VALUE_PHP       0.00
SUM_VAT_BASE                 0.00
SUM_VAT_PAID                 0.00
SUM_DUTY_PAID                0.00
SUM_DUTIES_AND_TAXES         0.00
SUM_EXCISE_ADVALOREM_PAID   16.54
SUM_NET_MASS_KGS             0.00
AVG_DUTIABLE_VALUE_PHP       0.00
AVG_VAT_BASE                 0.00
AVG_VAT_PAID                 0.00
AVG_DUTY_PAID                0.00
AVG_DUTIES_AND_TAXES         0.00
AVG_EXCISE_ADVALOREM_PAID   16.54
AVG_NET_MASS_KGS             0.00
MIN_DUTIABLE_VALUE_PHP       0.00
MIN_VAT_BASE                 0.00
MIN_VAT_PAID                 0.00
MIN_DUTY_PAID                0.00
MIN_DUTIES_AND_TAXES         0.00
MIN_EXCISE_ADVALOREM_PAID   16.54
MIN_NET_MASS_KGS             0.00
MAX_DUTIABLE_VALUE_PHP       0.00
MAX_VAT_BASE                 0.00
MAX_VAT_PAID                 0.00
MAX_DUTY_PAID                0.00
MAX_DUTIES_AND_TAXES         0.00
MAX_EXCISE_ADVALOREM_PAID   16.54
MAX_NET_MASS_KGS             0.00
COUNT_ROWS                   0.00
dtype: float64
  • Country data
In [10]:
# Extract importations data summary by year and country
sql = """
SELECT *
FROM summary_year_cntry
"""

df_summary_year_cntry = pd.read_sql(sql, conn)
df_summary_year_cntry['MONTH_YEAR'] = (pd.to_datetime(
                                       df_summary_year_cntry['MONTH_YEAR'],
                                       format='%Y-%m'))
In [11]:
# Preview data
display(df_summary_year_cntry.head())

# Get the dimensions of the df
print(f'Number of rows: {df_summary_year_cntry.shape[0]}\n'
      f'Number of columns: {df_summary_year_cntry.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
display(df_summary_year_cntry.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_summary_year_cntry.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_summary_year_cntry.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(df_summary_year_cntry.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(df_summary_year_cntry.isna().sum() / (len(df_summary_year_cntry))*100)
MONTH_YEAR COUNTRY_EXPORT SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS AVG_DUTIABLE_VALUE_PHP ... MIN_EXCISE_ADVALOREM_PAID MIN_NET_MASS_KGS MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID MAX_NET_MASS_KGS COUNT_ROWS
0 2015-01-01 None 1,064,784,798.00 1,095,264,620.00 129,498,498.00 19,800,478.00 221,626,576.00 342,464.00 20,482,566.33 570,930.19 ... 0.00 0.01 274,480,527.00 276,138,966.00 33,136,675.00 4,554,768.00 33,136,675.00 102,757.00 9892.8 1865
1 2015-01-01 ARGENTINA 1,528,122,757.00 1,651,336,941.00 11,609,722.00 108,126,300.00 119,925,877.00 181,739.00 83,521,009.88 23,153,375.11 ... 4.00 1.0 559,224,626.00 563,529,971.00 970,516.00 67,755,299.00 67,755,299.00 180,835.00 99730.0 66
2 2015-01-01 AUSTRALIA 3,762,129,469.00 3,867,970,414.00 280,259,798.00 47,513,309.00 335,666,913.00 7,701,001.00 174,863,099.36 2,032,484.86 ... 0.00 0.0 206,744,629.00 209,332,028.00 10,418,823.00 2,456,098.00 10,418,823.00 4,194,983.00 999.0 1851
3 2015-01-01 AUSTRIA 133,524,148.00 138,460,596.00 13,736,575.00 3,875,369.00 17,620,595.00 8,651.00 1,044,444.57 809,237.26 ... 8,651.00 0.0 17,372,400.00 17,584,537.00 2,110,144.00 372,951.00 2,283,868.00 8,651.00 96.93 165
4 2015-01-01 Andorra 606,871.00 705,233.00 84,628.00 91,030.00 175,658.00 NaN 890.00 606,871.00 ... NaN 890.0 606,871.00 705,233.00 84,628.00 91,030.00 175,658.00 NaN 890.0 1

5 rows × 31 columns

Number of rows: 12488
Number of columns: 31

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12488 entries, 0 to 12487
Data columns (total 31 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   MONTH_YEAR                 12488 non-null  datetime64[ns]
 1   COUNTRY_EXPORT             12415 non-null  object        
 2   SUM_DUTIABLE_VALUE_PHP     12488 non-null  float64       
 3   SUM_VAT_BASE               12488 non-null  float64       
 4   SUM_VAT_PAID               12488 non-null  float64       
 5   SUM_DUTY_PAID              12488 non-null  float64       
 6   SUM_DUTIES_AND_TAXES       12488 non-null  float64       
 7   SUM_EXCISE_ADVALOREM_PAID  11761 non-null  float64       
 8   SUM_NET_MASS_KGS           12488 non-null  float64       
 9   AVG_DUTIABLE_VALUE_PHP     12488 non-null  float64       
 10  AVG_VAT_BASE               12488 non-null  float64       
 11  AVG_VAT_PAID               12488 non-null  float64       
 12  AVG_DUTY_PAID              12488 non-null  float64       
 13  AVG_DUTIES_AND_TAXES       12488 non-null  float64       
 14  AVG_EXCISE_ADVALOREM_PAID  11761 non-null  float64       
 15  AVG_NET_MASS_KGS           12488 non-null  float64       
 16  MIN_DUTIABLE_VALUE_PHP     12488 non-null  float64       
 17  MIN_VAT_BASE               12488 non-null  float64       
 18  MIN_VAT_PAID               12488 non-null  float64       
 19  MIN_DUTY_PAID              12488 non-null  float64       
 20  MIN_DUTIES_AND_TAXES       12488 non-null  float64       
 21  MIN_EXCISE_ADVALOREM_PAID  11761 non-null  float64       
 22  MIN_NET_MASS_KGS           12488 non-null  object        
 23  MAX_DUTIABLE_VALUE_PHP     12488 non-null  float64       
 24  MAX_VAT_BASE               12488 non-null  float64       
 25  MAX_VAT_PAID               12488 non-null  float64       
 26  MAX_DUTY_PAID              12488 non-null  float64       
 27  MAX_DUTIES_AND_TAXES       12488 non-null  float64       
 28  MAX_EXCISE_ADVALOREM_PAID  11761 non-null  float64       
 29  MAX_NET_MASS_KGS           12488 non-null  object        
 30  COUNT_ROWS                 12488 non-null  int64         
dtypes: datetime64[ns](1), float64(26), int64(1), object(3)
memory usage: 3.0+ MB
None
Summary statistics for numerical columns:

SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS AVG_DUTIABLE_VALUE_PHP AVG_VAT_BASE AVG_VAT_PAID ... MIN_DUTY_PAID MIN_DUTIES_AND_TAXES MIN_EXCISE_ADVALOREM_PAID MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID COUNT_ROWS
count 12,488.00 12,488.00 12,488.00 12,488.00 12,488.00 11,761.00 12,488.00 12,488.00 12,488.00 12,488.00 ... 12,488.00 12,488.00 11,761.00 12,488.00 12,488.00 12,488.00 12,488.00 12,488.00 11,761.00 12,488.00
mean 3,705,604,074.30 3,906,368,155.09 385,439,957.98 66,412,819.46 436,344,115.05 63,729,568.45 76,145,046.43 11,067,629.26 11,460,454.62 1,084,135.69 ... 46,794.87 174,420.47 13,888.12 407,039,667.15 447,962,151.29 38,188,646.19 7,884,036.28 45,774,547.80 13,682,083.79 2,666.05
std 14,095,020,333.99 14,816,430,982.05 1,592,111,110.49 375,977,478.03 1,811,322,871.89 388,016,312.84 284,943,650.29 79,693,933.75 80,747,503.75 8,723,205.39 ... 1,205,425.25 4,753,053.89 574,955.84 5,495,077,973.95 5,977,250,806.31 571,316,830.74 311,814,904.01 721,736,953.70 183,076,557.99 11,309.06
min 10.00 0.00 0.00 0.00 0.00 0.00 0.00 10.00 0.00 0.00 ... 0.00 0.00 0.00 10.00 0.00 0.00 0.00 0.00 0.00 1.00
25% 2,867,075.75 3,136,836.50 105,160.00 47,028.75 137,810.18 0.00 18,000.00 430,834.33 461,284.43 18,977.55 ... 0.00 0.00 0.00 1,578,494.50 1,698,414.01 72,122.35 31,036.47 91,853.75 0.00 3.00
50% 41,182,518.70 43,836,093.50 2,755,241.85 1,007,581.50 3,400,088.35 0.00 357,861.30 1,229,490.22 1,304,991.08 84,159.39 ... 0.00 0.00 0.00 11,136,321.50 11,703,739.00 864,218.02 334,274.50 1,100,676.21 0.00 20.00
75% 902,908,137.33 960,135,889.46 73,796,232.94 20,663,530.97 96,047,085.42 551,171.00 10,457,382.96 3,110,764.81 3,298,299.59 222,621.01 ... 0.00 260.73 0.00 108,574,085.97 113,725,717.00 8,204,910.50 2,265,258.50 10,846,446.75 285,025.74 322.00
max 479,519,065,452.04 485,505,503,066.43 57,494,932,398.89 34,617,435,553.84 70,730,995,658.83 14,320,855,982.51 3,652,341,993.22 3,859,296,288.00 3,871,315,449.00 464,557,853.00 ... 124,497,907.00 464,557,853.00 44,168,776.00 242,616,739,570.34 262,056,064,649.71 31,446,727,757.96 34,144,111,833.19 65,590,839,591.15 13,499,424,000.00 202,281.00

8 rows × 27 columns

Summary statistics for object(string) columns:

count unique top freq
COUNTRY_EXPORT 12415 230 KOREA 159
MIN_NET_MASS_KGS 12488 3104 0.01 2669
MAX_NET_MASS_KGS 12488 6193 99000.0 220
Check for null columns:

MONTH_YEAR                     0
COUNTRY_EXPORT                73
SUM_DUTIABLE_VALUE_PHP         0
SUM_VAT_BASE                   0
SUM_VAT_PAID                   0
SUM_DUTY_PAID                  0
SUM_DUTIES_AND_TAXES           0
SUM_EXCISE_ADVALOREM_PAID    727
SUM_NET_MASS_KGS               0
AVG_DUTIABLE_VALUE_PHP         0
AVG_VAT_BASE                   0
AVG_VAT_PAID                   0
AVG_DUTY_PAID                  0
AVG_DUTIES_AND_TAXES           0
AVG_EXCISE_ADVALOREM_PAID    727
AVG_NET_MASS_KGS               0
MIN_DUTIABLE_VALUE_PHP         0
MIN_VAT_BASE                   0
MIN_VAT_PAID                   0
MIN_DUTY_PAID                  0
MIN_DUTIES_AND_TAXES           0
MIN_EXCISE_ADVALOREM_PAID    727
MIN_NET_MASS_KGS               0
MAX_DUTIABLE_VALUE_PHP         0
MAX_VAT_BASE                   0
MAX_VAT_PAID                   0
MAX_DUTY_PAID                  0
MAX_DUTIES_AND_TAXES           0
MAX_EXCISE_ADVALOREM_PAID    727
MAX_NET_MASS_KGS               0
COUNT_ROWS                     0
dtype: int64
Display % of null columns:

MONTH_YEAR                  0.00
COUNTRY_EXPORT              0.58
SUM_DUTIABLE_VALUE_PHP      0.00
SUM_VAT_BASE                0.00
SUM_VAT_PAID                0.00
SUM_DUTY_PAID               0.00
SUM_DUTIES_AND_TAXES        0.00
SUM_EXCISE_ADVALOREM_PAID   5.82
SUM_NET_MASS_KGS            0.00
AVG_DUTIABLE_VALUE_PHP      0.00
AVG_VAT_BASE                0.00
AVG_VAT_PAID                0.00
AVG_DUTY_PAID               0.00
AVG_DUTIES_AND_TAXES        0.00
AVG_EXCISE_ADVALOREM_PAID   5.82
AVG_NET_MASS_KGS            0.00
MIN_DUTIABLE_VALUE_PHP      0.00
MIN_VAT_BASE                0.00
MIN_VAT_PAID                0.00
MIN_DUTY_PAID               0.00
MIN_DUTIES_AND_TAXES        0.00
MIN_EXCISE_ADVALOREM_PAID   5.82
MIN_NET_MASS_KGS            0.00
MAX_DUTIABLE_VALUE_PHP      0.00
MAX_VAT_BASE                0.00
MAX_VAT_PAID                0.00
MAX_DUTY_PAID               0.00
MAX_DUTIES_AND_TAXES        0.00
MAX_EXCISE_ADVALOREM_PAID   5.82
MAX_NET_MASS_KGS            0.00
COUNT_ROWS                  0.00
dtype: float64
In [12]:
# Drop country_export with null values
df_summary_year_cntry.dropna(subset='COUNTRY_EXPORT', inplace=True)
df_summary_year_cntry.isna().sum()
Out[12]:
MONTH_YEAR                     0
COUNTRY_EXPORT                 0
SUM_DUTIABLE_VALUE_PHP         0
SUM_VAT_BASE                   0
SUM_VAT_PAID                   0
SUM_DUTY_PAID                  0
SUM_DUTIES_AND_TAXES           0
SUM_EXCISE_ADVALOREM_PAID    727
SUM_NET_MASS_KGS               0
AVG_DUTIABLE_VALUE_PHP         0
AVG_VAT_BASE                   0
AVG_VAT_PAID                   0
AVG_DUTY_PAID                  0
AVG_DUTIES_AND_TAXES           0
AVG_EXCISE_ADVALOREM_PAID    727
AVG_NET_MASS_KGS               0
MIN_DUTIABLE_VALUE_PHP         0
MIN_VAT_BASE                   0
MIN_VAT_PAID                   0
MIN_DUTY_PAID                  0
MIN_DUTIES_AND_TAXES           0
MIN_EXCISE_ADVALOREM_PAID    727
MIN_NET_MASS_KGS               0
MAX_DUTIABLE_VALUE_PHP         0
MAX_VAT_BASE                   0
MAX_VAT_PAID                   0
MAX_DUTY_PAID                  0
MAX_DUTIES_AND_TAXES           0
MAX_EXCISE_ADVALOREM_PAID    727
MAX_NET_MASS_KGS               0
COUNT_ROWS                     0
dtype: int64
  • Chapters
In [13]:
# Read the dataset and display head
sql = """
SELECT *
FROM chapters
"""
df_chapters = pd.read_sql(sql, conn)
In [14]:
# Preview data
display(df_chapters.head())

# Get the dimensions of the df
# Get the dimensions of the df
print(f'Number of rows: {df_chapters.shape[0]}\n'
      f'Number of columns: {df_chapters.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
display(df_chapters.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_chapters.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_chapters.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(df_chapters.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(df_chapters.isna().sum() / (len(df_chapters))*100)
HSCODE_2 CHAPTER
0 01 Live animals
1 02 Meat and edible meat offal
2 03 Fish and crustaceans, molluscs and other aquatic invertebrates
3 04 Dairy produce; birds' eggs; natural honey; edible products of animal origin, not elsewhere specified or included
4 05 Products of animal origin, not elsewhere specified or included
Number of rows: 97
Number of columns: 2

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97 entries, 0 to 96
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   HSCODE_2  97 non-null     object
 1   CHAPTER   97 non-null     object
dtypes: object(2)
memory usage: 1.6+ KB
None
Summary statistics for numerical columns:

HSCODE_2 CHAPTER
count 97 97
unique 97 97
top 01 Live animals
freq 1 1
Summary statistics for object(string) columns:

count unique top freq
HSCODE_2 97 97 01 1
CHAPTER 97 97 Live animals 1
Check for null columns:

HSCODE_2    0
CHAPTER     0
dtype: int64
Display % of null columns:

HSCODE_2   0.00
CHAPTER    0.00
dtype: float64

Refer to Notes below for notes about the completeness of the data.

Rice Importation

In [15]:
# Create a df to store yearly importation of rice
df_rice_imports = pd.read_sql("""
    SELECT substr(month_year,1,4) as YR, COUNTRY_EXPORT,
           SUM(SUM_NET_MASS_KGS)/1000 as SUM_NETMASS_MT
    FROM summary
    WHERE substr('0000000000'||HS_CODE, -11, 11) like '1006%'
    GROUP BY YR, COUNTRY_EXPORT
    """, conn)
In [16]:
# Preview data
display(df_rice_imports.head())

# Get the dimensions of the df
print(f'Number of rows: {df_rice_imports.shape[0]}\n'
      f'Number of columns: {df_rice_imports.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
display(df_rice_imports.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_rice_imports.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_rice_imports.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(df_rice_imports.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(df_rice_imports.isna().sum() / (len(df_rice_imports))*100)
YR COUNTRY_EXPORT SUM_NETMASS_MT
0 2015 None 2,500.00
1 2015 BRAZIL 0.33
2 2015 CHINA 5,702.03
3 2015 INDIA 11,342.79
4 2015 INDONESIA 0.17
Number of rows: 131
Number of columns: 3

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 131 entries, 0 to 130
Data columns (total 3 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   YR              131 non-null    object 
 1   COUNTRY_EXPORT  126 non-null    object 
 2   SUM_NETMASS_MT  131 non-null    float64
dtypes: float64(1), object(2)
memory usage: 3.2+ KB
None
Summary statistics for numerical columns:

SUM_NETMASS_MT
count 131.00
mean 130,655.48
std 424,588.40
min 0.00
25% 1.75
50% 300.00
75% 19,840.02
max 2,521,693.07
Summary statistics for object(string) columns:

count unique top freq
YR 131 8 2021 23
COUNTRY_EXPORT 126 31 INDIA 8
Check for null columns:

YR                0
COUNTRY_EXPORT    5
SUM_NETMASS_MT    0
dtype: int64
Display % of null columns:

YR               0.00
COUNTRY_EXPORT   3.82
SUM_NETMASS_MT   0.00
dtype: float64
In [17]:
# Drop country_export with null values
df_rice_imports.dropna(subset='COUNTRY_EXPORT', inplace=True)
df_rice_imports.isna().sum()
Out[17]:
YR                0
COUNTRY_EXPORT    0
SUM_NETMASS_MT    0
dtype: int64
In [18]:
# Create a df to store top countries where PH imports rice
topn_rice = 40
top_rice_importers = (df_rice_imports.groupby(['YR', 'COUNTRY_EXPORT'])
                                     .sum()
                                     .nlargest(topn_rice, 'SUM_NETMASS_MT'))

list_top_rice_countries = (top_rice_importers.reset_index()['COUNTRY_EXPORT']
                           .unique().tolist())

paramlist = r'?'

for i in range(1, len(list_top_rice_countries)):
    paramlist = paramlist + r', ?'

sql = f"""
SELECT substr(month_year,1,4) as YR, COUNTRY_EXPORT,
       SUM(SUM_NET_MASS_KGS)/1000 as SUM_NETMASS_MT,
       SUM(SUM_DUTIES_AND_TAXES) as SUM_DUTIES_AND_TAXES,
       SUM(SUM_DUTIABLE_VALUE_PHP+SUM_DUTIES_AND_TAXES)/SUM(SUM_NET_MASS_KGS)
       as SUM_DV_DT_PER_KGS
FROM summary
WHERE substr('0000000000'||HS_CODE, -11, 11) like '1006%'
  AND country_export in ({paramlist})
GROUP BY YR, country_export
"""

top_countries_per_yr = pd.read_sql(sql, conn, params=list_top_rice_countries)
In [19]:
# Preview data
display(top_countries_per_yr.head())

# Get the dimensions of the df
print(f'Number of rows: {top_countries_per_yr.shape[0]}\n'
      f'Number of columns: {top_countries_per_yr.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
display(top_countries_per_yr.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(top_countries_per_yr.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(top_countries_per_yr.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(top_countries_per_yr.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(top_countries_per_yr.isna().sum() / (len(top_countries_per_yr))*100)
YR COUNTRY_EXPORT SUM_NETMASS_MT SUM_DUTIES_AND_TAXES SUM_DV_DT_PER_KGS
0 2015 CHINA 5,702.03 525,216.00 145.36
1 2015 INDIA 11,342.79 30,228,947.00 105.98
2 2015 SINGAPORE 0.12 3,454.00 411.80
3 2015 THAILAND 730,930.80 2,125,989,633.00 21.10
4 2015 VIET NAM 953,476.43 1,792,409,848.00 19.99
Number of rows: 52
Number of columns: 5

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   YR                    52 non-null     object 
 1   COUNTRY_EXPORT        52 non-null     object 
 2   SUM_NETMASS_MT        52 non-null     float64
 3   SUM_DUTIES_AND_TAXES  52 non-null     float64
 4   SUM_DV_DT_PER_KGS     52 non-null     float64
dtypes: float64(3), object(2)
memory usage: 2.2+ KB
None
Summary statistics for numerical columns:

SUM_NETMASS_MT SUM_DUTIES_AND_TAXES SUM_DV_DT_PER_KGS
count 52.00 52.00 52.00
mean 327,942.76 1,661,019,019.87 58.61
std 627,462.98 3,803,797,068.29 68.05
min 0.12 0.00 3.38
25% 10,745.61 27,882,929.73 22.38
50% 61,853.48 266,831,101.32 25.63
75% 277,984.47 1,012,433,798.02 85.87
max 2,521,693.07 16,415,236,247.53 411.80
Summary statistics for object(string) columns:

count unique top freq
YR 52 8 2017 7
COUNTRY_EXPORT 52 7 CHINA 8
Check for null columns:

YR                      0
COUNTRY_EXPORT          0
SUM_NETMASS_MT          0
SUM_DUTIES_AND_TAXES    0
SUM_DV_DT_PER_KGS       0
dtype: int64
Display % of null columns:

YR                     0.00
COUNTRY_EXPORT         0.00
SUM_NETMASS_MT         0.00
SUM_DUTIES_AND_TAXES   0.00
SUM_DV_DT_PER_KGS      0.00
dtype: float64

Notes¶

  • PREFERENTIAL_CODE pertains to codes used to refer to importations from countries with free trade agreements with the Philippines. Importations with these codes pay lower or zero duties. This is only available for selected transactions and countries.

  • Not all importations are required to pay an excise tax, so it's normal for PREFERENTIAL_CODE to have null values. Therefore, no additional clean-up or data transformation is required to address null/incomplete Preferential Code and Excise Ad Valorem tax values in the data frame since those columns are optional or on an as-applicable basis.

  • On the other hand, records with NULL values in the COUNTRY_EXPORT column of the data frame were removed because there's no other reliable source to identify which country the record belongs to. The COUNTRY_ORIGIN field which sometimes contains the 2-character code of the country cannot be used. Exploring this column showed that many records contain the same COUNTRY_ORIGIN but different COUNTRY_EXPORT values. Moreover, the total annual dutiable values of these records with NULL COUNTRY_EXPORT is less than 1% of the total dutiable value of imports for each year.

  • On another note, since the files were too large to save to a data frame at once (>6 GB), the team had to create different data frames for specific grouping by categories as itemized and processed above (i.e., Chapters and Tariff Codes, Rice Importation, and Census).

Census Data¶

In [20]:
# Read the dataset and display head
sql2 = """
SELECT *
FROM census
"""
df_census = pd.read_sql(sql2, conn2)
In [21]:
# Preview data
display(df_census.head())

# Get the dimensions of the df
print(f'Number of rows: {df_census.shape[0]}\n'
      f'Number of columns: {df_census.shape[1]}\n')

# List the datatype of each column and count of non-null values
print(f'List of all columns, count of non-null values, and datatypes of '
      'the df:\n')
display(df_census.info())

# Get summary statistics for the numerical columns and transpose the DataFrame
# for more readable output
print(f'\nSummary statistics for numerical columns:\n')
display(df_census.describe())

# Get summary statistics for the object (string) columns
print(f'Summary statistics for object(string) columns:\n')
display(df_census.describe(include=[object]).T)

# Get count of null values
print(f'Check for null columns:\n')
display(df_census.isna().sum())

# display % of null values per column
print(f'Display % of null columns:\n')
display(df_census.isna().sum() / (len(df_census))*100)
LOCATION POPULATION_2015-8 POPULATION_2020-5
0 PHILIPPINES 100,981,437.00 109,035,343.00
1 NATIONAL CAPITAL REGION (NCR) 12,877,253.00 13,484,462.00
2 CITY OF MANILA 1,780,148.00 1,846,513.00
3 CITY OF MANDALUYONG 386,276.00 425,758.00
4 CITY OF MARIKINA 450,741.00 456,059.00
Number of rows: 135
Number of columns: 3

List of all columns, count of non-null values, and datatypes of the df:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135 entries, 0 to 134
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   LOCATION           135 non-null    object 
 1   POPULATION_2015-8  135 non-null    float64
 2   POPULATION_2020-5  135 non-null    float64
dtypes: float64(2), object(1)
memory usage: 3.3+ KB
None
Summary statistics for numerical columns:

POPULATION_2015-8 POPULATION_2020-5
count 135.00 135.00
mean 2,244,000.32 2,422,976.54
std 8,847,721.60 9,558,833.63
min 17,246.00 18,831.00
25% 388,495.50 427,890.00
50% 689,668.00 728,402.00
75% 1,677,498.50 1,776,898.00
max 100,981,437.00 109,035,343.00
Summary statistics for object(string) columns:

count unique top freq
LOCATION 135 135 PHILIPPINES 1
Check for null columns:

LOCATION             0
POPULATION_2015-8    0
POPULATION_2020-5    0
dtype: int64
Display % of null columns:

LOCATION            0.00
POPULATION_2015-8   0.00
POPULATION_2020-5   0.00
dtype: float64

NOTE: The census dataset is complete and without any null values, so no additional data transformation required.

Other processing steps¶

In [22]:
# Display the importation data
df_imports_agg.head()
Out[22]:
MONTH_YEAR HS_CODE PREFERENTIAL_CODE SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS ... MIN_EXCISE_ADVALOREM_PAID MIN_NET_MASS_KGS MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID MAX_NET_MASS_KGS COUNT_ROWS
0 2015-1 10019019000 None 2,413,111,061.00 2,440,378,069.00 0.00 0.00 0.00 NaN 153,329,722.00 ... NaN 1000787.0 369,253,676.00 372,889,558.00 0.00 0.00 0.00 NaN 8006296.0 20
1 2015-1 10019099000 None 1,448,184,761.00 1,571,805,522.00 0.00 101,372,917.00 101,372,917.00 NaN 125,329,720.00 ... NaN 0.0 216,549,708.00 234,706,251.00 0.00 15,158,479.00 15,158,479.00 NaN 9460325.0 32
2 2015-1 10019099000 AIFTA 18,672,091.00 19,794,256.00 0.00 746,881.00 746,881.00 NaN 2,005,240.00 ... NaN 229540.0 6,410,765.00 6,789,302.00 0.00 256,430.00 256,430.00 NaN 691410.0 5
3 2015-1 10019099000 ANFTA 262,946,606.00 266,354,909.00 0.00 0.00 0.00 NaN 20,094,214.00 ... NaN 16129544.0 206,744,629.00 209,332,028.00 0.00 0.00 0.00 NaN 99390.0 13
4 2015-1 10019919000 None 77,393,616.00 78,178,366.00 9,381,401.00 0.00 9,381,401.00 NaN 4,470,871.00 ... NaN 406149.0 17,263,204.00 17,433,569.00 2,092,028.00 0.00 2,092,028.00 NaN 989386.0 7

5 rows × 32 columns

Update MONTH_YEAR column to datetime format. Then create a new column named HSCODE_2 where we get the first 2 characters of the HS_CODE which correspond to the chapter an item belongs to. And finally, merge the imports dataframe with the df_chapters dataframe:

In [23]:
df_imports_agg['MONTH_YEAR'] = (pd.to_datetime(
    df_imports_agg['MONTH_YEAR'], format='%Y-%m'))
df_imports_agg['HSCODE_2'] = df_imports_agg['HS_CODE'].str[:2]
df_imports_agg = df_imports_agg.merge(df_chapters, on='HSCODE_2')
df_imports_agg.head()
Out[23]:
MONTH_YEAR HS_CODE PREFERENTIAL_CODE SUM_DUTIABLE_VALUE_PHP SUM_VAT_BASE SUM_VAT_PAID SUM_DUTY_PAID SUM_DUTIES_AND_TAXES SUM_EXCISE_ADVALOREM_PAID SUM_NET_MASS_KGS ... MAX_DUTIABLE_VALUE_PHP MAX_VAT_BASE MAX_VAT_PAID MAX_DUTY_PAID MAX_DUTIES_AND_TAXES MAX_EXCISE_ADVALOREM_PAID MAX_NET_MASS_KGS COUNT_ROWS HSCODE_2 CHAPTER
0 2015-01-01 10019019000 None 2,413,111,061.00 2,440,378,069.00 0.00 0.00 0.00 NaN 153,329,722.00 ... 369,253,676.00 372,889,558.00 0.00 0.00 0.00 NaN 8006296.0 20 10 Cereals
1 2015-01-01 10019099000 None 1,448,184,761.00 1,571,805,522.00 0.00 101,372,917.00 101,372,917.00 NaN 125,329,720.00 ... 216,549,708.00 234,706,251.00 0.00 15,158,479.00 15,158,479.00 NaN 9460325.0 32 10 Cereals
2 2015-01-01 10019099000 AIFTA 18,672,091.00 19,794,256.00 0.00 746,881.00 746,881.00 NaN 2,005,240.00 ... 6,410,765.00 6,789,302.00 0.00 256,430.00 256,430.00 NaN 691410.0 5 10 Cereals
3 2015-01-01 10019099000 ANFTA 262,946,606.00 266,354,909.00 0.00 0.00 0.00 NaN 20,094,214.00 ... 206,744,629.00 209,332,028.00 0.00 0.00 0.00 NaN 99390.0 13 10 Cereals
4 2015-01-01 10019919000 None 77,393,616.00 78,178,366.00 9,381,401.00 0.00 9,381,401.00 NaN 4,470,871.00 ... 17,263,204.00 17,433,569.00 2,092,028.00 0.00 2,092,028.00 NaN 989386.0 7 10 Cereals

5 rows × 34 columns

Results and Discussions

Exploratory Data Analysis (EDA)¶

OBJECTIVE:

The objective of this EDA is to identify outliers, trends, and patterns, and provide insights regarding the Philippine Importations from 2015 to 2022, especially those years during PRRD's Term, and to determine imported rice consumption patterns of Filipinos between 2015 (before RTL) and 2020 (a year after RTL implementation).

QUESTIONS:

1. Are there outliers in the importation data?
2. What are the trends involving Philippine importation from 2015 to 2022?
    2.1 Yearly
    2.2 Monthly
3. Which chapters dominate the Philippine importation scene? Which chapters are weak?
    3.1 During PRRD's Term (June 2016 to June 2022)
    3.2 Yearly (from 2015 to 2022)
4. Which countries dominate the Philippine importation scene? 
   Are there noticeable patterns in the countries where we typically import?
    4.1 During PRRD's Term (June 2016 to June 2022)
    4.2 Yearly (from 2015 to 2022)
5. Are there significant changes in imported rice consumption and trends before 
   and after RTL implementation?

Are there outliers in the importation data?¶

Outliers were identified in the imports data. The process of identification and removal of the outliers were performed in "Customs-outliers.ipynb" file.

Out of all the identified outliers, only three items were considered invalid. The individual importation records for those three items were determined to be invalid because those items were recorded at above PHP 1 trillion dutiable value for a single declaration which have never happened between 2015 to Sept 2022.

The invalid rows removed from the dataset are shown below:

In [24]:
outliers()
Out[24]:
MONTH_YEAR HS_CODE ACTUAL_DUTY COUNTRY_EXPORT COUNTRY_ORIGIN CURRENCY DUTIABLE_VALUE_FOREIGN DUTIABLE_VALUE_PHP DUTIES_AND_TAXES DUTY_PAID EXCHANGE_RATE EXCISE_ADVALOREM_PAID GOODS_DESCRIPTION NET_MASS_KGS PREFERENTIAL_CODE VAT_BASE VAT_PAID
0 2019-5 85331090000 None JAPAN JP USD 85331090061.0 4,481,759,512,184.00 0.00 0.00 52.522 0.00 RESISTOR 5.3 None 4,487,361,717,889.00 538,483,406,146.00
1 2021-3 73182990000 None GERMANY, FEDERAL REPUBLIC OF DE USD 284808279857.97 13,862,473,405,527.00 3,217,563,438,244.03 1,386,247,340,552.70 48.673 0.00 SCREW 0.5 None 15,260,967,480,761.10 1,831,316,097,691.33
2 2021-3 73269099000 None GERMANY, FEDERAL REPUBLIC OF DE USD 117771598642.4 5,732,297,020,721.54 1,652,371,150,129.57 859,844,553,108.23 48.673 0.00 THREAD INSERT 0.5 None 6,604,388,308,511.18 792,526,597,021.34

What are the trends involving Philippine importation from 2015 to 2022?¶

Yearly trend based on dutiable value and duties and taxes¶

In [25]:
trends_yearly()

INSIGHTS:

  • The dutiable value of Philippine imports had a stable increase of 10% from 2016 to 2018. It also had a steady dutiable value of at least Php 6.1 trillion from 2018 up to the first half of 2022. The dutiable value of Philippine imports from other countries had a stable increase of 10% from 2016 to 2018. There's a little decline in 2020 due to lockdown protocols implemented during the peak of the pandemic. A slight increase in 2021 was brought about by the delays on recording from the previous year.

  • The massive increase in dutiable value did not magnify the trend in duties and taxes.

  • Assessed duties and taxes of dutiable imports increased massively starting in 2020. The 8-9% duties and taxes rate from 2016-2019 increased to 15-16% from 2020 onwards. This was mainly due to the implementation of online filing starting in 2020 to deliberately record the correct duties and taxes of importations.

  • Interestingly, this period when BOC had increased duties and taxes was also former Commissioner Rey "Jagger" Guerrero and his reform team's time in the Bureau of Customs. He is a straight arrow and a true reformist known for his integrity and professionalism in making the right things done. It was when BOC started its "Transformation Program" and enrolled in the Performance Governance System of the Institute for Solidarity in Asia from which they got the prestigious Island of Good Governance distinction after passing all four stages of the system with Gold Trailblazer awards.

Monthly trend based on dutiable value and duties and taxes¶

In [26]:
trends_monthly()

INSIGHTS:

  • The monthly total dutiable value of imports had a decline coinciding with the start of the lockdown. However, it reached an all-time high a few months later. Upon inspection of the records, the sudden peak in total importation values can be attributed to electronics, specifically laptops. This was the time when offices were shifting to a work-from-home setup and some schools enforced online learning.

  • Since the pandemic, the trend of the monthly total dutiable values of imports had been steadily increasing. For the last 4 months (June-Sept 2022), the monthly dutiable value was above 700 billion PHP. This threshold has only been surpassed four other times before (Nov 2016, March 2019, July 2020, and March 2022).

  • The biggest spikes in dutiable value didn't reflect to the duties and taxes. This means that the items that caused the spikes had tax exemptions. This can be seen in the effective rate of duties and taxes over the dutiable value graph where the dips in percentages coincide with the peaks in dutiable value.

  • Before the pandemic, the effective rate of duties and taxes over dutiable value was consistently playing at around less than 10%. It spiked at the start of the pandemic, and presently, it seems to have stabilized at around 15%.

  • BOC shifted to online filing of goods declaration and supporting documents when the COVID-19 pandemic started in March 2020. BOC banked on the new IT online systems to reduce face-to-face interaction and subsequently curb graft and corruption when transacting with the Bureau. The changes in the IT systems and processes of BOC implemented under the administration of former BOC Commissioner Rey Leonardo B. Guerrero seem to have contributed to the significant increase in duties and taxes of importations since March 2020 since all transactions are online and can be seen in the head office's Commissioner's dashboard immediately. The upward trend of the charts above clearly shows the remarkable improvement in the assessment of duties and taxes since 2019.

Which chapters dominate the Philippine importation scene? Which chapters are weak?¶

Top and bottom chapters during PRRD's term¶

In [27]:
prrd_top_bot_chapters()

INSIGHTS:

  • 70% of total dutiable value came from the top 10 chapters while 74% of total duties and taxes (DT) came from the top 10 chapters.
  • Electrical machinery dominated the top 10 chapters based on DV. While mineral fuels chapter is in 2nd place. On the other hand, in terms of DT, the top 2 switched places. This is because the effective duties and taxes rate (Sum of VAT, Excise, and Duties) of Electrical machinery is only 6% as opposed to 23% in mineral fuels. Also, the Electrical machinery chapter had a lower effective tax rate due to most of it was imported from China which had preferential rates that either required them to pay a lower rate or zero duties and/or VAT.
  • According to DTI's Trade Statistics website, electronic machinery and equipment is also the Philippines' top 1 export product with at least 53% of total exports over the last five years.
  • Vegetable plaiting is consistently the least imported product based on DV and DT.
  • Importation of live animals and plants is highly regulated and requires permits and clearances from the Bureau of Animal Industry and Bureau of Plant Industry, so it makes sense to be part of the bottom chapters that the Philippines imports.

Yearly top chapters from 2015-2022¶

Compare top 5 chapters between dutiable value and duties & taxes¶

In [28]:
top5_compare_chapters()

INSIGHTS:

  • Top 5 Chapters were consistent with both Top Dutiable Values and Top Duties & Taxes
  • These chapters are 85, 27, 84, 87, and 39. However, rankings differ from each other with respect to Dutiable Values and Duties & Taxes
  • Top 1 & 2 in terms of Dutiable Value are Chapters 85 and 27, respectively.
  • But, Chapter 27 leads in terms of Duties & Taxes.
  • Chapter 27 (Electrical machineries) has lower tax rate (6%) compared to that of Chapter 85 (23%).
  • The contributing factor to the relatively low tax rate of Chapter 27 is the country source of most Electrial machineries which is China. Preferential rates are applied for items imported from China.

Top chapters from 2015-2022¶

In [29]:
yearly_top20_chapters()

INSIGHTS

  • As mentioned earlier, chapter 27 leads the duties & taxes assessed, and this is because of the higher tax rate in this chapter.
  • Even though from 2018 to 2019, there is a noticeable dutiable value decrease for chapter 27, the trend of its duties & taxes was not affected, in fact, it even increased.
  • This is because the fuel excise tax was increased to PHP7 in 2018, continued increasing to PHP9 in 2019, and further increased to PHP10 from 2020 onwards.
  • With this, for 2019 onwards, duties & taxes for chapter 27 are at relatively high values.
  • The big dip in 2019 for chapter 27 reflects the 12.6 percent decrease in importations from this chapter. Refineries experienced disruptions because of earthquakes, which reduced crude oil imports.
  • Chapter 85, had a decreasing dutiable value starting year 2020.
  • Importations for chapter 85 were mostly from China. Consistent with the analysis of importations per country, China had a decreasing rate of importations starting this year as well which may explain the decreasing trend for this year onwards.
In [30]:
compare_top2_chapters()

INSIGHTS

  • The observed trend was due to the online filing of goods declaration and supporting documents when the COVID-19 started, consistent with the above insights on this.
  • The upward trend for assessed duties & taxes for chapter 85 from 2020 to 2021, as opposed to its dutiable value dip on the same year, clearly shows the remarkable improvement in the assessment of duties and taxes, in the instance of this chapter.

Compare bottom 5 chapters¶

In [31]:
bottom5_compare_chapters()

INSIGHTS:

  • As opposed to what is reflected in the top 5 chapters based on the dutiable values and duties & taxes, only three chapters are consistently at the bottom: Chapters 14, 06, and 45.
  • Chapter 14 - Vegetable Planting Materials
  • Chapter 06 - Live trees and other plants; bulbs, roots and the like; cut flowers and ornamental foliage
  • Chapter 45 - Cork and articles of cork
In [32]:
yearly_bottom5_chapters()

INSIGHTS:

  • It can be seen that even though there is consistent import on chapter 14, as reflected in dutiable values, there were no duties & taxes from this chapter from the years 2018 to 2019.
  • Chapter 50 is observed only in the bottom 5 chapters in duties & taxes, but not in dutiable values. This might be because the tax rate on these items is lower than the other items in the bottom chapters based on dutiable values.
  • Chapters 6 and 97 were only reflected once on dutiable values, 2016 and 2022 respectively. Chapter 6 is not reflected in the bottom 5 chapters based on the duties & taxes while there is no record for the duties and taxes assessed for chapter 97 in year 2022.

Which countries dominate the Philippine importation scene? Are there noticeable patterns in the countries where we typically import?¶

Top countries during PRRD's Term¶

In [33]:
prrd_top_countries()
sum of dutiable value during PRRD's Term   = 36.60 trillion
sum of duties and taxes during PRRD's Term = 4.10 trillion
COUNTRY_EXPORT SUM_DUTIABLE_VALUE_PHP % OF TOTAL
0 CHINA 7,170,404,163,498.27 19.59
1 JAPAN 3,867,783,984,392.89 10.57
2 KOREA 3,274,610,539,045.60 8.95
3 UNITED STATES 2,580,633,510,576.17 7.05
4 SINGAPORE 2,236,211,172,285.96 6.11
5 THAILAND 2,205,012,917,816.22 6.02
6 INDONESIA 2,124,130,447,336.61 5.80
7 HONG KONG 1,774,062,107,696.08 4.85
8 TAIWAN 1,660,756,199,020.07 4.54
9 MALAYSIA 1,400,963,233,365.91 3.83
COUNTRY_EXPORT SUM_DUTIES_AND_TAXES % OF TOTAL
0 CHINA 899,059,707,407.24 21.95
1 KOREA 396,657,896,234.21 9.68
2 JAPAN 360,157,525,006.46 8.79
3 INDONESIA 319,177,254,798.15 7.79
4 THAILAND 305,772,745,263.82 7.46
5 SINGAPORE 292,424,740,426.86 7.14
6 MALAYSIA 197,255,985,481.58 4.82
7 UNITED STATES 184,232,648,605.71 4.50
8 VIET NAM 139,382,163,316.48 3.40
9 TAIWAN 122,886,160,713.98 3.00

INSIGHTS:

  • Except for the United States, 9 out of the 10 countries with the highest dutiable value when it comes to imports are Asian countries. 4 out of these belong to ASEAN countries. These top 10 countries amounted to 77.31% of the total dutiable value imported during PRRD's term. Lastly, China is the biggest importer in terms of dutiable value which amounted to 19.59% of the total during Duterte's term.

  • Similar to dutiable values, 9 out of the 10 countries with highest duties and taxes from importing are Asian countries except for the US. 5 out of these also belong to the ASEAN countries. The top 10 countries contributed 78.53% of the total duties and taxes from imports during Duterte's term. Similarly, China is the highest tax payer which was assessed to have 21.95% of the total duties and taxes from imports during PRRD's term.

  • Hong Kong is among the top 10 countries in terms of dutiable value of imports, but is not included in the 10 highest payers of duties and taxes.

  • Vietnam is among the top 10 countries in terms of duties and taxes, but is not among the importers with highest dutiable value.

Top 15 countries from 2015 to 2022¶

Based on dutiable value in PHP¶

In [34]:
yearly_top15_countries_dv()

The above chart shows that the dutiable value of CHINA importations grew consistently from 2016 to 2020 but started decreasing slightly from 2021 onwards. It has remained the top importing country since 2016. It breached the 1 trillion PHP mark in 2018.

While the USA importations have not increased but maintained between 300 to 400 Billion PHP.

In [35]:
top15_countries_increases_dv()

JAPAN, the second highest importing country, is far below CHINA in terms of the total dutiable value of importations.

It is interesting to note also that the PH increased importations from CHINA, JAPAN, KOREA and HONG KONG in 2020, the start of the COVID-19 pandemic year, while the other top countries showed a decrease in importations.

Based on duties and taxes¶

In [36]:
yearly_top15_countries_dt()

Are there significant changes in imported rice consumption and trends before and after RTL implementation?¶

Volume of Rice Importation¶

In [37]:
rice_volume()

Duties and Taxes of Rice Importation¶

In [38]:
rice_dt()

The Rice Tariffication Law (RTL) which took effect in March 2019 replaced the quantitative restrictions on rice importations with 35% to 40% tariff. From the graphs above, rice importations from Viet Nam increased while rice importations from Thailand decreased. Alongside it, the duties and taxes of Viet Nam rice importations jumped 2.5 times from 4.5 Billion pesos to 12.6 Billion pesos after a year of implementing the RTL.

In [39]:
rice_costperkilo()

Since 2019, the importation cost of rice from Viet Nam has gone down compared to Thailand's rice importation cost.

The cost of rice importation ranged between 20 to 28 pesos per kilogram. However, rice from Pakistan and Myanmar registered even lower importation cost since 2017 and 2019, respectively.

Total Rice Importation versus Population Growth¶

In [40]:
rice_per_capita()

The above graph shows the Philippine rice importation in Metric Tons from 2015 to 2022 and the population count in 2015 and 2020. It's interesting to note that back in 2015, the ratio of rice importation versus population is around 16.85 kilograms of imported rice per person. But in 2020, this ratio has increased to 20.55 kilograms per person.

Conclusion

The team's aim in conducting this study was to identify the trends, patterns, and insights regarding the performance of the Philippine importation during PRRD's term. Based on the team's detailed analyses, several insights were gathered that provided answers to the team's questions as elaborated further below.

The Duterte administration was known for its fight against illegal drugs, criminality, and corruption. Also, when PRRD took office in 2016, he pledged to shift the Philippines away from the United States in favor of China and Russia. However, towards his last two years in office, he had a change of heart that can probably be dated to as early as June 2020 because of the amped-up assertiveness of Beijing in the South China Sea like the incidents involving hundreds of Chinese militia boats.

Data analysis of the importations from 2015 to 2022 revealed massive growth of importations from China which broke the 1 trillion-peso total dutiable value in 2018 and peaked in 2020 but started decreasing since then. On the other hand, the importations from the United States remained stable between the 300 to 500 billion annual total dutiable value. The total dutiable value of importations from Japan was a far second from China reaching only 800 billion pesos in 2020.

Although the Duterte administration shifted its foreign policy relations to China and away from the United States, Duterte edged away from it during the last two years of his term. The effect of this can be seen from the start of the decrease in the total dutiable value of importations from China since 2020.

It is also important to note that during Duterte’s term, more than three fourths of the total dutiable value from importations came from Asian countries. Among these countries are ASEAN-member countries like Thailand, Indonesia, Malaysia and Singapore. The United States is the only western country to be among the top importers in the Philippines. When it comes to the assessed amount of taxes and duties from importations during Duterte’s term, almost the same group of countries were observed with the exception of Vietnam. With importation value at 19.59% of the total during PRRD’s presidency, China ranks first with a total assessment of 21.95% of the whole duties and taxes assessed from imports during Duterte’s term.

The Philippine importations during the first and second half of PRRD's term showed an opposite trend between dutiable value and duties and taxes. From 2016 to 2018, the total dutiable value had an average growth rate of 25% but the total duties and taxes remained flat at 14%. However, from 2019 to 2022, the total dutiable value showed an average decrease of 1%, while the total duties and taxes posted a 19% increase.

Based on the team's analysis, the importations during the first half of PRRD's term were attributed largely to the significant increase in the importations from China. Since the Philippines has a free trade agreement with China, the majority of the goods imported from China were exempted from paying duties which therefore drove the effective rate of duties and taxes down. On the other hand, the increasing and all-time-high breaches in total duties and taxes assessed were mainly due to the appointment of the new BOC Commissioner Guerrero in 2019 amidst the COVID-19 pandemic. Guerrero's implementation of his 10-point priority program that included fully automating the transactions in the frontline and the pandemic that made it urgent for everyone to immediately embrace the online filing and processing initiatives of the BOC resulted in a notable and remarkable increase in the total duties and tax assessments and reduction in graft and corruption. Such a turn of events also resulted in another milestone that was reached where the total duties and taxes assessed after June 2020 breached 68 billion and never went below it again. This is an example of digitalization initiatives that the next term should prioritize and capitalize on to continue the positive and increasing trend in duties and taxes that the government collects.

In terms of chapters/groups of products being imported, the electrical machinery and mineral fuels chapters have consistently bagged the top two spots since 2015. Electrical machinery was largely imported from China so the effective duties and taxes were lower than that of the mineral fuels since the latter had higher duties and excise tax rates, in addition to VAT.

The Rice Tariffication Law also took effect during PRRD’s administration in March 2019. It effectively changed the quantitative restrictions on rice importation to a 35%-40% tariff rate. It also created the Rice Competitiveness Enhancement Fund (RCEF) or Rice Fund with a P10 billion annual appropriation for the next six years to be allocated and disbursed for rice farm machinery and equipment, rice seed development, propagation, and promotion, expanded rice credit assistance and rice extension services.

The analysis of rice importations showed that the volume increased 2.5 times since 2018 and annual total duties and taxes exceeded 10 billion pesos which are more than enough to support the RCEF or Rice Fund. Most of the imported rice came from Viet Nam totaling 2.5 million Metric Tons in 2021. On the other hand, there was a significant drop in the volume of imported rice from Thailand after 2018.

Relating rice importations with Philippine census data to the total population in 2015 and 2020, the ratio of imported rice per person increased from just 16.85 kilograms to 20.55 kilograms. The analysis also showed that although the cost of imported rice started decreasing since 2020, it is still above 23 pesos per kilogram. Pakistan and Myanmar have lower costs of imported rice that reached 20 pesos per kilogram.

Recommendations

Considering the varying insights obtained from the analyses above, the team has formed the following recommendations:

Recommendations to BOC¶

  1. Data Quality: The BOC should consider setting stricter and more robust validation checks in the online goods declaration form. The BOC should put validation checks for each importation record being submitted by the importer (e.g., ensure that no blank inputs for required fields (such as COUNTRY_EXPORT), standardize country selection for both COUNTRY_EXPORT (full name) and COUNTRY_ORIGIN (2 characters), and no individual importation should exceed 1 trillion PHP (or the max allowable import declaration), among others).

  2. Customs Memorandum Orders: The BOC should evaluate customs memorandum orders carefully, especially if they affect electronic shipments.

  3. Continuity of projects with the new administration: The BOC should continue and improve the online filing of goods declaration since it had proven to be successful and has led to almost double the collection of duties and taxes than before such initiatives were implemented. It has also led to a reduction of corruption in the bureau.

  4. Application Programming Interface (API): The BOC should consider using an API for sharing importation data to external parties so that the data correction and updates will be reflected in real-time. For data files already on the website, BOC should put data field description and notes for each monthly file. Also, historical files should be regularly updated to reflect data corrections. Date time fields like the date of declaration/entry, date of assessment, and date of payment should be added back just like in the previous years to allow more explorative analysis of time series data. Port codes should also be put back to allow the analysis of port importation performance.

Recommendations for further studies/analysis¶

  1. Rice importation source: Encourage rice importation from Pakistan and Myanmar which have significantly lower effective importation costs than Viet Nam and Thailand rice shipments. This may further drive down the cost of rice in the Philippines. Also, the country needs to diversify its source of imported rice rather than relying entirely on Viet Nam.

  2. Boost trade with countries other than China: Trade imports from Japan and Korea should be further boosted to get closer to China's figures on the dutiable value of imports.

  3. Focus on the top 10 chapters: Knowing that at least 70% of the total dutiable value and duties and taxes are coming from the top 10 chapters, the relevant government agencies should prioritize thinking about how to potentially increase duties and taxes collection by studying those that were on the top 10 chapters based on dutiable value, but were not in the top 10 in terms of duties and taxes assessed. Based on such a trend in the top chapters, in combination with the export data, the government should also focus on encouraging investors to set up manufacturing and trading sites in the Philippines by including the industries affected in the Strategic Investment Priority Plan and others, as applicable.

  4. Redirect importations: The relevant government agencies should consider proactively studying and assessing the statistics and trends in the effective cost of imported goods. If there are cheaper, same quality sources of imported goods, the DTI, or any other agencies should encourage the importers to source from those countries instead to help drive down the cost of products in the Philippines.

Data Assumptions and Limitations¶

Although the team is confident about the findings in this report, it is important to address our assumptions and this study's limitations.

Based on our research, we found out that the publicly available customs importation dataset includes values that are based on the assessment date and information. Those columns with "PAID" suffixes in dutiable value, duties, taxes, VAT, and other columns do not actually mean paid but rather the amount that has been assessed based on the assessment date. Such an amount will only be paid after a certain number of days from the assessment date.

Moreover, though the team would have preferred to compare same-period datasets between census and customs importation data, the relevant census data that were available and were extracted only includes 2015 and 2020 values. In that case, we were only able to compare the 2015 and 2020 census data with customs importation data.

Lastly, for ease in comparing periodic trends, the team decided to use only the Philippine Peso value of the importations. Hence, this study excludes the impact of inflation and changes in exchange rates.

References

  1. BOC Harnesses Technology to Enhance Systems and Processes. (2019, September 2). Bureau of Customs. https://customs.gov.ph/boc-harnesses-technology-to-enhance-systems-and-processes/
  2. Grossman, D. (2021, November 2). Duterte's Dalliance with China Is Over. RAND Corporation. https://www.rand.org/blog/2021/11/dutertes-dalliance-with-china-is-over.html
  3. Importation of Pets (Animals) and Household Plants. (2022, October). Bureau of Customs. https://customs.gov.ph/pet-animals-and-household-plants/
  4. Malindog-Uy, A. (2021, December 27). Duterte Legacy: China Relations And Foreign Policy. The Asean Post. https://theaseanpost.com/article/duterte-legacy-china-relations-and-foreign-policy
  5. Online Filing of Goods Declaration. (2020, March 24). Bureau of Customs. https://customs.gov.ph/wp-content/uploads/2020/03/ocom-memo-61-2020-Online_Filing_of_Goods_Declaration.pdf
  6. Philippine Merchandise Imports from The World (in US$) Historical Trend. (2022, October). Trade Statistics - Tradeline Philippines. http://www.tradelinephilippines.dti.gov.ph:8080/imports
  7. Philippine Tariff Finder. (2022, October). Republic of the Philippines, Tariff Commission. http://finder.tariffcommission.gov.ph/
  8. Reports of Import Entries. (2022, October). Bureau of Customs Website. https://customs.gov.ph/import-reports/
  9. Rice Tariffication Law Is the Best Model That We Have to Help Both Farmers and Consumers. (2022, May 19). Republic of the Philippines, National Economic And Development Authority. https://neda.gov.ph/rice-tariffication-law-is-the-best-model-that-we-have-to-help-both-farmers-and-consumers-neda/#:~:text=Republic%20Act%20No.,funded%20by%20the%20tariff%20revenues
  10. Tariff Book. (2022, October) Republic of the Philippines, Tariff Commission. https://tariffcommission.gov.ph/tariff-book
  11. 2020 Census of Population and Housing (2020 CPH) Population Counts Declared Official by the President. (2021, July 7). Philippine Statistics Authority. https://psa.gov.ph/content/2020-census-population-and-housing-2020-cph-population-counts-declared-official-president